Dear reader, I'm not updating these pages anymore. If you have tc or ip related questions, you can post them on the LARTC mailing list.



From help-file :
CSZ packet scheduler
CONFIG_NET_SCH_CSZ
  Say Y here if you want to use the Clark-Shenker-Zhang (CSZ) packet
  scheduling algorithm for some of your network devices. At the
  moment, this is the only algorithm that can guarantee service for
  real-time applications (see the top of net/sched/sch_csz.c for
  details and references about the algorithm).
 
  Note: this scheduler is currently broken.
 
  This code is also available as a module called sch_csz.o ( = code
  which can be inserted in and removed from the running kernel
  whenever you want). If you want to compile it as a module, say M
  here and read Documentation/modules.txt.

From source-files :
    Clark-Shenker-Zhang algorithm.
    =======================================
 
    SOURCE.
 
    David D. Clark, Scott Shenker and Lixia Zhang
    "Supporting Real-Time Applications in an Integrated Services Packet
    Network: Architecture and Mechanism".
 
    CBQ presents a flexible universal algorithm for packet scheduling,
    but it has pretty poor delay characteristics.
    Round-robin scheduling and link-sharing goals
    apparently contradict minimization of network delay and jitter.
    Moreover, correct handling of predictive flows seems to be
    impossible in CBQ.
 
    CSZ presents a more precise but less flexible and less efficient
    approach. As I understand it, the main idea is to create
    WFQ flows for each guaranteed service and to allocate
    the rest of bandwith to dummy flow-0. Flow-0 comprises
    the predictive services and the best effort traffic;
    it is handled by a priority scheduler with the highest
    priority band allocated for predictive services, and the rest ---
    to the best effort packets.
 
    Note that in CSZ flows are NOT limited to their bandwidth.  It
    is supposed that the flow passed admission control at the edge
    of the QoS network and it doesn't need further shaping. Any
    attempt to improve the flow or to shape it to a token bucket
    at intermediate hops will introduce undesired delays and raise
    jitter.
 
    At the moment CSZ is the only scheduler that provides
    true guaranteed service. Another schemes (including CBQ)
    do not provide guaranteed delay and randomize jitter.
    There is a proof (Sally Floyd), that delay
    can be estimated by a IntServ compliant formula.
    This result is true formally, but it is wrong in principle.
    It takes into account only round-robin delays,
    ignoring delays introduced by link sharing i.e. overlimiting.
    Note that temporary overlimits are inevitable because
    real links are not ideal, and the real algorithm must take this
    into account.
 
        ALGORITHM.
 
    --- Notations.
 
    $B$ is link bandwidth (bits/sec).
 
    $I$ is set of all flows, including flow $0$.
    Every flow $a \in I$ has associated bandwidth slice $r_a < 1$ and
    $\sum_{a \in I} r_a = 1$.
 
    --- Flow model.
 
    Let $m_a$ is the number of backlogged bits in flow $a$.
    The flow is {\em active}, if $m_a > 0$.
    This number is a discontinuous function of time;
    when a packet $i$ arrives:
    \[
    m_a(t_i+0) - m_a(t_i-0) = L^i,
    \]
    where $L^i$ is the length of the arrived packet.
    The flow queue is drained continuously until $m_a == 0$:
    \[
    {d m_a \over dt} = - { B r_a \over \sum_{b \in A} r_b}.
    \]
    I.e. flow rates are their allocated rates proportionally
    scaled to take all available link bandwidth. Apparently,
    it is not the only possible policy. F.e. CBQ classes
    without borrowing would be modelled by:
    \[
    {d m_a \over dt} = - B r_a .
    \]
    More complicated hierarchical bandwidth allocation
    policies are possible, but unfortunately, the basic
    flow equations have a simple solution only for proportional
    scaling.
 
    --- Departure times.
 
    We calculate the time until the last bit of packet is sent:
    \[
    E_a^i(t) = { m_a(t_i) - \delta_a(t) \over r_a },
    \]
    where $\delta_a(t)$ is number of bits drained since $t_i$.
    We have to evaluate $E_a^i$ for all queued packets,
    then find the packet with minimal $E_a^i$ and send it.
 
    This sounds good, but direct implementation of the algorithm
    is absolutely infeasible. Luckily, if flow rates
    are scaled proportionally, the equations have a simple solution.
 
    The differential equation for $E_a^i$ is
    \[
    {d E_a^i (t) \over dt } = - { d \delta_a(t) \over dt} { 1 \over r_a} =
    { B \over \sum_{b \in A} r_b}
    \]
    with initial condition
    \[
    E_a^i (t_i) = { m_a(t_i) \over r_a } .
    \]
 
    Let's introduce an auxiliary function $R(t)$:
 
    --- Round number.
 
    Consider the following model: we rotate over active flows,
    sending $r_a B$ bits from every flow, so that we send
    $B \sum_{a \in A} r_a$ bits per round, that takes
    $\sum_{a \in A} r_a$ seconds.
 
    Hence, $R(t)$ (round number) is a monotonically increasing
    linear function of time when $A$ is not changed
    \[
    { d R(t) \over dt } = { 1 \over \sum_{a \in A} r_a }
    \]
    and it is continuous when $A$ changes.
 
    The central observation is that the quantity
    $F_a^i = R(t) + E_a^i(t)/B$ does not depend on time at all!
    $R(t)$ does not depend on flow, so that $F_a^i$ can be
    calculated only once on packet arrival, and we need not
    recalculate $E$ numbers and resorting queues.
    The number $F_a^i$ is called finish number of the packet.
    It is just the value of $R(t)$ when the last bit of packet
    is sent out.
 
    Maximal finish number on flow is called finish number of flow
    and minimal one is "start number of flow".
    Apparently, flow is active if and only if $F_a \leq R$.
 
    When a packet of length $L_i$ bit arrives to flow $a$ at time $t_i$,
    we calculate $F_a^i$ as:
 
    If flow was inactive ($F_a < R$):
    $F_a^i = R(t) + {L_i \over B r_a}$
    otherwise
    $F_a^i = F_a + {L_i \over B r_a}$
 
    These equations complete the algorithm specification.
 
    It looks pretty hairy, but there is a simple
    procedure for solving these equations.
    See procedure csz_update(), that is a generalization of
    the algorithm from S. Keshav's thesis Chapter 3
    "Efficient Implementation of Fair Queeing".
 
    NOTES.
 
    * We implement only the simplest variant of CSZ,
    when flow-0 is a explicit 4band priority fifo.
    This is bad, but we need a "peek" operation in addition
    to "dequeue" to implement complete CSZ.
    I do not want to do that, unless it is absolutely
    necessary.
 
    * A primitive support for token bucket filtering
    presents itself too. It directly contradicts CSZ, but
    even though the Internet is on the globe ... :-)
    "the edges of the network" really exist.
 
    BUGS.
 
    * Fixed point arithmetic is overcomplicated, suboptimal and even
    wrong. Check it later.