From help-file :
CSZ packet scheduler CONFIG_NET_SCH_CSZ Say Y here if you want to use the Clark-Shenker-Zhang (CSZ) packet scheduling algorithm for some of your network devices. At the moment, this is the only algorithm that can guarantee service for real-time applications (see the top of net/sched/sch_csz.c for details and references about the algorithm). Note: this scheduler is currently broken. This code is also available as a module called sch_csz.o ( = code which can be inserted in and removed from the running kernel whenever you want). If you want to compile it as a module, say M here and read Documentation/modules.txt.
From source-files :
Clark-Shenker-Zhang algorithm. ======================================= SOURCE. David D. Clark, Scott Shenker and Lixia Zhang "Supporting Real-Time Applications in an Integrated Services Packet Network: Architecture and Mechanism". CBQ presents a flexible universal algorithm for packet scheduling, but it has pretty poor delay characteristics. Round-robin scheduling and link-sharing goals apparently contradict minimization of network delay and jitter. Moreover, correct handling of predictive flows seems to be impossible in CBQ. CSZ presents a more precise but less flexible and less efficient approach. As I understand it, the main idea is to create WFQ flows for each guaranteed service and to allocate the rest of bandwith to dummy flow-0. Flow-0 comprises the predictive services and the best effort traffic; it is handled by a priority scheduler with the highest priority band allocated for predictive services, and the rest --- to the best effort packets. Note that in CSZ flows are NOT limited to their bandwidth. It is supposed that the flow passed admission control at the edge of the QoS network and it doesn't need further shaping. Any attempt to improve the flow or to shape it to a token bucket at intermediate hops will introduce undesired delays and raise jitter. At the moment CSZ is the only scheduler that provides true guaranteed service. Another schemes (including CBQ) do not provide guaranteed delay and randomize jitter. There is a proof (Sally Floyd), that delay can be estimated by a IntServ compliant formula. This result is true formally, but it is wrong in principle. It takes into account only round-robin delays, ignoring delays introduced by link sharing i.e. overlimiting. Note that temporary overlimits are inevitable because real links are not ideal, and the real algorithm must take this into account. ALGORITHM. --- Notations. $B$ is link bandwidth (bits/sec). $I$ is set of all flows, including flow $0$. Every flow $a \in I$ has associated bandwidth slice $r_a < 1$ and $\sum_{a \in I} r_a = 1$. --- Flow model. Let $m_a$ is the number of backlogged bits in flow $a$. The flow is {\em active}, if $m_a > 0$. This number is a discontinuous function of time; when a packet $i$ arrives: \[ m_a(t_i+0) - m_a(t_i-0) = L^i, \] where $L^i$ is the length of the arrived packet. The flow queue is drained continuously until $m_a == 0$: \[ {d m_a \over dt} = - { B r_a \over \sum_{b \in A} r_b}. \] I.e. flow rates are their allocated rates proportionally scaled to take all available link bandwidth. Apparently, it is not the only possible policy. F.e. CBQ classes without borrowing would be modelled by: \[ {d m_a \over dt} = - B r_a . \] More complicated hierarchical bandwidth allocation policies are possible, but unfortunately, the basic flow equations have a simple solution only for proportional scaling. --- Departure times. We calculate the time until the last bit of packet is sent: \[ E_a^i(t) = { m_a(t_i) - \delta_a(t) \over r_a }, \] where $\delta_a(t)$ is number of bits drained since $t_i$. We have to evaluate $E_a^i$ for all queued packets, then find the packet with minimal $E_a^i$ and send it. This sounds good, but direct implementation of the algorithm is absolutely infeasible. Luckily, if flow rates are scaled proportionally, the equations have a simple solution. The differential equation for $E_a^i$ is \[ {d E_a^i (t) \over dt } = - { d \delta_a(t) \over dt} { 1 \over r_a} = { B \over \sum_{b \in A} r_b} \] with initial condition \[ E_a^i (t_i) = { m_a(t_i) \over r_a } . \] Let's introduce an auxiliary function $R(t)$: --- Round number. Consider the following model: we rotate over active flows, sending $r_a B$ bits from every flow, so that we send $B \sum_{a \in A} r_a$ bits per round, that takes $\sum_{a \in A} r_a$ seconds. Hence, $R(t)$ (round number) is a monotonically increasing linear function of time when $A$ is not changed \[ { d R(t) \over dt } = { 1 \over \sum_{a \in A} r_a } \] and it is continuous when $A$ changes. The central observation is that the quantity $F_a^i = R(t) + E_a^i(t)/B$ does not depend on time at all! $R(t)$ does not depend on flow, so that $F_a^i$ can be calculated only once on packet arrival, and we need not recalculate $E$ numbers and resorting queues. The number $F_a^i$ is called finish number of the packet. It is just the value of $R(t)$ when the last bit of packet is sent out. Maximal finish number on flow is called finish number of flow and minimal one is "start number of flow". Apparently, flow is active if and only if $F_a \leq R$. When a packet of length $L_i$ bit arrives to flow $a$ at time $t_i$, we calculate $F_a^i$ as: If flow was inactive ($F_a < R$): $F_a^i = R(t) + {L_i \over B r_a}$ otherwise $F_a^i = F_a + {L_i \over B r_a}$ These equations complete the algorithm specification. It looks pretty hairy, but there is a simple procedure for solving these equations. See procedure csz_update(), that is a generalization of the algorithm from S. Keshav's thesis Chapter 3 "Efficient Implementation of Fair Queeing". NOTES. * We implement only the simplest variant of CSZ, when flow-0 is a explicit 4band priority fifo. This is bad, but we need a "peek" operation in addition to "dequeue" to implement complete CSZ. I do not want to do that, unless it is absolutely necessary. * A primitive support for token bucket filtering presents itself too. It directly contradicts CSZ, but even though the Internet is on the globe ... :-) "the edges of the network" really exist. BUGS. * Fixed point arithmetic is overcomplicated, suboptimal and even wrong. Check it later.