Dear reader, I'm not updating these pages anymore. If you have tc or ip related questions, you can post them on the LARTC mailing list.



Intro

On the LARTC mailing list, there was a long discussion about how a packet is handled by the kernel. Finally, there was a post by Leonardo Balliache that I copied onto this page. I hope this helps people to better understand how it all works.

I added extra info aboute the IMQ device, I hope I didn't make any mistakes. All info/updates/corrections are welcome.

Kernel Packet Traveling Diagram

                            Network
                    -----------+-----------
                               |
                  +--------------------------+
          +-------+-------+        +---------+---------+
          |    IPCHAINS   |        |      IPTABLES     |
          |     INPUT     |        |     PREROUTING    |
          +-------+-------+        | +-------+-------+ |
                  |                | |   conntrack   | |
                  |                | +-------+-------+ |
                  |                | |    mangle     | | <- MARK WRITE  
                  |                | +-------+-------+ |
                  |                | |      IMQ      | |
                  |                | +-------+-------+ |
                  |                | |      nat      | | <- DEST REWRITE
                  |                | +-------+-------+ |     DNAT or REDIRECT or DE-MASQUERADE
                  |                +---------+---------+
                  +------------+-------------+
                               |
                       +-------+-------+
                       |      QOS      |
                       |    INGRESS    |
                       +-------+-------+
                               |
         packet is for +-------+-------+ packet is for
          this machine |     INPUT     | another address
        +--------------+    ROUTING    +--------------+
        |              |    + PDBB     |              |
        |              +---------------+              |
+-------+-------+                                     |
|   IPTABLES    |                                     |
|     INPUT     |                                     |
| +-----+-----+ |                                     |
| |   mangle  | |                                     |
| +-----+-----+ |                                     |
| |   filter  | |                                     |
| +-----+-----+ |                                     |
+-------+-------+                                     |
        |                               +---------------------------+
+-------+-------+                       |                           |
|     Local     |               +-------+-------+           +-------+-------+
|    Process    |               |    IPCHAINS   |           |    IPTABLES   |
+-------+-------+               |    FORWARD    |           |    FORWARD    |
        |                       +-------+-------+           | +-----+-----+ |
+-------+-------+                       |                   | |  mangle   | | <- MARK WRITE
|    OUTPUT     |                       |                   | +-----+-----+ |
|    ROUTING    |                       |                   | |  filter   | |
+-------+-------+                       |                   | +-----+-----+ |
        |                               |                   +-------+-------+
+-------+-------+                       |                           |
|    IPTABLES   |                       +---------------------------+
|     OUTPUT    |                                     |
| +-----------+ |                                     |
| | conntrack | |                                     |
| +-----+-----+ |                                     |
| |   mangle  | | <- MARK WRITE                       |
| +-----+-----+ |                                     |
| |    nat    | | <-DEST REWRITE                      |
| +-----+-----+ |     DNAT or REDIRECT                |
| |   filter  | |                                     |
| +-----+-----+ |                                     |
+-------+-------+                                     |
        |                                             |
        +----------------------+----------------------+
                               |
                  +------------+------------+
                  |                         |
          +-------+-------+       +---------+---------+
          |    IPCHAINS   |       |      IPTABLES     |
          |     OUTPUT    |       |    POSTROUTING    |
          +-------+-------        | +-------+-------+ |
                  |               | |    mangle     | | <- MARK WRITE  
                  |               | +-------+-------+ |
                  |               | |      nat      | | <- SOURCE REWRITE
                  |               | +-------+-------+ |      SNAT or MASQUERADE
                  |               | |      IMQ      | |
                  |               | +-------+-------+ |
                  |               +---------+---------+
                  +------------+------------+
                               |
                        +------+------+
                        |     QOS     |
                        |    EGRESS   |
                        +------+------+
                               |
                    -----------+-----------
                            Network

My remarks on the diagram

Leonardo notes

Updates

I remove conntrack from POSTROUTING. More info on https://www.frozentux.net/iptables-tutorial/iptables-tutorial.html. See Chapter 7:

"All connection tracking is handled in the PREROUTING chain, except locally generated packets which are handled in the OUTPUT chain. What this means is that iptables will do all recalculation of states and so on within the PREROUTING chain. If we send the initial packet in a stream, the state gets set to NEW within the OUTPUT chain, and when we receive a return packet, the state gets changed in the PREROUTING chain to ESTABLISHED, and so on. If the first packet is not originated by ourself, the NEW state is set within the PREROUTING chain of course. So, all state changes and calculations are done within the PREROUTING and OUTPUT chains of the nat table."


I received this email :

Since I've recently failed in a few iproute2 experiments, I have the following comment on the packet travelling guide: It is incomplete in respect to locally generated packets.

The traversal guide states that routing actually happens before the packet enters the netfilter OUTPUT queue. However, this is not all that happens in current kernels. If the packet is somehow modified while traversing the output queue (for example, by putting a fwmark on it), netfilter recognizes that the packet needs to be routed again and does so. So, there is possibly another 'OUTPUT ROUTING' rectangle after the netfilter OUTPUT chain.

However, if I'm not mistaken, there are some problems with that (as per my recent posting to the lartc- and netdev-list, it seems that the source address is chosen before the netfilter OUTPUT is traversed, and subsequent the subsequently chosen other route's src attribute no longer affects the source address of the socket/packet).

Now, I also might be totally wrong, but so far nobody has been able to point out what exactly I'm misunderstanding...


See also https://www.comparitech.com/net-admin/beginners-guide-ip-tables/ for a basic iptables guid>

More info

http://www.gnumonks.org/ftp/pub/doc/packet-journey-2.4.html


http://www.linuxvirtualserver.org/Joseph.Mack/HOWTO/LVS-HOWTO-19.html#ss19.21