Dear reader, I'm not updating these pages anymore. If you have tc or ip related questions, you can post them on the LARTC mailing list.

PFIFO

PFIFO = Packet First In First Out

This is the standard qdisc. It's the standard root qdisc and the CBQ classes has standard a FIFO qdisc attached. But this qdisc is not fair in sending packets and most of the time the users replace it with a better qdisc. The packets are send in the same order as they arrived from the networking code. The PFIFO qdisc can contain a limited number of packets.

example

To attach a fifo queue of 10 packets as the root qdisc:

tc qdisc add dev eth0 root pfifo limit 10

From mailing-list :

as you can see in fifo_init:

 if (sch->ops == &bfifo_qdisc_ops)
                         q->limit = sch->dev->tx_queue_len*sch->dev->mtu;
                 else    
                         q->limit = sch->dev->tx_queue_len;

The limit for the bfifo is in bytes and its default value is the devices tx queue length (configured i.e. by using ifconfig, usually about 100 I believe) times MTU. The limit for pfifo is in packets, default is the devices tx queue length.

BFIFO

The bfifo queue is like pfifo, but instead of containing a limited number of packets it will at contain a limited number of bytes. Don't know where it could be useful.

PRIO

This queue has 3 built-in bands. Each band uses a FIFO qdisc to send the packets. As first, everything is sent from band 0, then from band 1 and then from band 2. This can be used to give some real-time services a higher priority, like telnet or ssh.

To add the default prio qdisc as the root qdisc:

tc qdisc add dev eth0 root handle 1: prio

$ tc qdisc show dev eth0
qdisc prio 10: bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
$ tc class show dev eth0
class prio 1: parent 1: [UNKNOWN]
class prio 1: parent 1: [UNKNOWN]
class prio 1: parent 1: [UNKNOWN]

from kernel

SFQ (Stochastic Fairness Queue)

A SFQ queue sets up a hash table. This is used to map packets that belongs to the same conversation to FIFO queues. This hash tables is frequently reseted so there are less queues then possible flows. The FIFO queues may send one after one a packet. So the flows are threaten equally. This approach reduces the number of required FIFO queues. In case of IP packets, the source, destination address and protocol number is used to determine the different flows.

When you choose the hash table big enough, the unfairness is less because there are enough queues to handle the different flows. So for best performance, choose the hash table a little bigger then the active flows which is much less then all the possible flows.

As quantum we choose the MTU, perturb gives period of hash function perturbation in seconds.

from kernel
Other info : only one page but good explanation ps format or text format.

quantum: bytes
perturb: seconds

CBQ (Class Based Queueing)

CBQ is a qdiscs on which you can assign classes. Each classes owns a qdisc and this is standard a FIFO qdisc. But you can replace this FIFO qdisc by a CBQ qdiscs that contains classes, etc, ... .

I did a lot of testing with CBQ and you can find the results on the test page.

From everywhere I found this explanation about the available parameters:

Bandwidth is the maximum bandwidth of the device where the queue is attached. This can be a NIC or a class from another qdisc. For a root-qdisc, the bandwidth has to be the same as the bandwidth of the device where it's attached to and not the link bandwidth. All QOS-elements with the same major number, has to have the same bandwidth.
avpkt is average packet size. Used to determine transmission time : transmission time = avpkt/link-bandwidth. You have to lower this for real-time traffic since they use smaller packets (usual <400).
maxidle can be seen as the maximum credit that a class can reach when being underlimit for a period of time So the higher maxidle is, the more packets in burst the class may send when switching from underlimit to overlimit.
mpu is minimal packet size. For ethernet-like devices this is 64. Packets smaller then MPU will be resetted to MPU. Typical MTU = 64.
allot is MTU + MAC header (parameter used by the link sharing scheduler)
Allot is always more then avpk*3/2. From tc-source:
```
if (allot < (avpkt*3)/2)
   allot = (avpkt*3)/2;
```
minburst is the number of bytes that will be transmitted in the shortest burst.
maxburst measure allowed class burstiness (please, read S.Floyd and VJ papers) : this is the maximum of bytes that will be transmitted in the longest burst.
est 1sec 8sec means, that kernel will evaluate average rate on this class with period 1sec and time constant 8sec. This rate is viewed with "tc -s class ls dev $DEVICE"
weight should be set proportional to rate. You can use this to divide the traffic in the different classes. (more info about splitting) The weight determines the number of bytes that a class is allowed to send in a scheduling round.
A cell value of 8 means that the packet transmission time will be measured in terms of 8 bytes.
bounded means that the class cannot borrow unused bandwidth from its ancestors. (see CBQ tests)
isolated means that the class will not share bandwidth with with any of non-descendant classes ? ? ?

from kernel

RED (Random Early Drop)

RED has some extra smartness built in. When a TCP/IP session starts, neither end knows the amount of bandwidth available. So TCP/IP starts to transmit slowly and goes faster and faster, though limited by the latency at which ACKs return.

Once a link is filling up, RED starts dropping packets, which indicate to TCP/IP that the link is congested, and that it should slow down. The smart bit is that RED simulates real congestion, and starts to drop some packets some time before the link is entirely filled up. Once the link is completely saturated, it behaves like a normal policer.

from kernel

In order to use RED, you must decide on three parameters: Min, Max, and burst.
Min sets the minimum queue size in bytes before dropping will begin,
Max is a soft maximum that the algorithm will attempt to stay under,
and burst sets the maximum number of packets that can 'burst through'.
RED_min: Link multiplied with max. acceptable throughput (queue length in bytes)
RED_max: twice min, on slow links up to four times min (queue length in bytes)
RED_burst: (2*min+max)/(3*avpkt) should be efficient
RED_limit: queue size where RED becomes tail-drop, eight times max
RED_avpkt: average packet size

CSZ

from kernel

TBF (Token Bucket Filter)

You can attach a TBF qdisc to a class to limit the bandwidth. The packets are stored in a 'bucket' and are send at a certain frequency. When there are to much packets, they have to wait until they can be send.

It's possible to send a burst of packets and you can specify the burst at the command line of tc.

from kernel
Other info: very good paper about tbf (6 pages, 17 May 2001)

Description.
  The TBF implementation consists of a buffer (bucket),
  constantly filled by some virtual pieces of information
  called tokens, at specific rate (token rate). The most
  important parameter of the bucket is its size, that is
  number of tokens it can store.

  Each arriving token lets one incoming data packet of
  out the queue and is then deleted from the bucket.
  Associating this algorithm with the two flows --
  token and data, gives us three possible scenarios:

  -The data arrives into TBF at rate equal the rate of
   incoming tokens. In this case each incoming packet
   has its matching token and passes the queue without delay.
  -The data arrives into TBF at rate smaller than the token
   rate. Only some tokens are deleted at output of each
   data packet sent out the queue, so the tokens accumulate,
   up to the bucket size. The saved tokens can be then used
   to send data over the token rate, if short data burst occurs.
  -The data arrives into TBF at rate bigger than the token
   rate. In this case filter overrun occurs -- incoming data
   can be only sent out without loss until all accumulated
   tokens are used. After that, over-limit packets are dropped.

PRIO

from kernel

TEQL

from kernel

Short description:
------------------

                  +-------+   eth1   +-------+
                  |       |==========|       |
  'network 1' ----|   A   |          |   B   |---- 'network 2'
                  |       |==========|       |
                  +-------+   eth2   +-------+

   Router A and B are connected to each other with two links.
   The device to be set up is basically a roundrobbin distributor
   over eth1 and eth2, for sending packets. No data ever comes in
   over an teql device, that just appears on the 'raw' eth1 and eth2.
   But now we just have devices, we also need proper routing. One way
   to do this is to assign a /31 network to both links, and a /31 to
   the teql0 device as well. That means, we have to have a transfer
   network for this.

   Routing:  eth1_A: 10.0.0.0/31
             eth1_B: 10.0.0.1/31

             eth2_A: 10.0.0.2/31
             eth2_B: 10.0.0.3/31

             teql0_A: 10.0.0.4/31
             teql0_B: 10.0.0.5/31

             If you only have Linux boxes on each side, you can use /31
             as subnet and so you can avoid wasting IP-Addresses for
             broadcast and network addresses

   You have to set up the IP-Addresses accordingly. Your default gw
   should point to teql0.

   In order to use this feature TEQL must be enables in your kernel.
   You have to set up this framework on both sides (ie. Router A and B).

CAVEATS:
--------

  You have to disable Reverse Path Filtering! Ie. I guess you do not want
  to use this in a Firewall/DMZ area. :-)
  Then there is the nasty problem of packet reordering. Let's say 6 packets
  need to be sent from A to B - eth1 might get 1, 3 and 5. eth2 would then
  do 2, 4 and 6. In an ideal world, router B would receive this in order,
  1, 2, 3, 4, 5, 6. But the possibility is very real that the kernel gets
  it like this: 2, 1, 4, 3, 6, 5. The problem is that this confuses TCP/IP.
  While not a problem for links carrying many different TCP/IP sessions,
  you won't be able to to a bundle multiple links and get to ftp a single
  file lots faster, except when your receiving or sending OS is Linux,
  which is not easily shaken by some simple reordering.

WRR

from kernel