IP Network Software Based Fast Packet Forward Design Principle

The main responsibility of router is to route the IP/L3 packet out as soon as possible once it enters the router.

IP Packet](/uploads/ip/router-1.png)

Let’s look into the inside to see what are the principles to follow in order to achieve the best throughput.

High Level architecture of software based packet forward

The following diagram describes the high level architecture of software based packet forward regardless of what OS is running.

IP Packet forward architecture](/uploads/ip/pkt-forward-arch.jpg)

Here is packet processing flow:

  1. The packet enters the router's ingress Ethernet interface's PHY.
  2. PHY decodes the line signal into bits, form packet, forwards packet to ethernet controller.
  3. The ethernet controller copied the packet into memory through DMA by looking the state of receive ring's descriptor's state.
  4. The ethernet controller raises interrupt to the running OS after packet is placed into memory through DMA.
  5. The running OS handles the receive interrupt request, and processing the packet based on system's setup to forward packet. ISR will adjust receive ring state to allow new packets to be received.
  6. The packet is eventually placed in egress interface transmit ring's descriptor.
  7. The ethernet controller will transmit the packet to egress interface PHY, which will perform actual packet transmit.
  8. The ethernet controller raises the interrupt to the runnig OS to indicate the packet transmit is completed.
  9. The running OS will handle the transmit interrupt, and adjust transmit ring state for more packets to be transmited later.

Based on the above packet flow, there are many factors to consider to optimize the packet forward throughput, and a few priciples to follow:

  • The etherent controller receive ring descriptor and transmit ring descriptor have to be properly defined based on ethernet controller hardware specification.
  • The etherent controller receive ring and transmit ring lenghth needs to be tuned to have an optimum size. If it is too short, it will cause packet drop, if it is too long, it will waste the memory. Typically for Gigabit ethernet interface, 1K entries is good for receive ring, the double size is set for transmit ring.
  • A pool of DMA addressable memory need to be pre-allocated, and link to receive ring.
  • Since the packet is copied into memory through DMA, it is critical to have fast memory access from hadrware design perspective.
  • The ethernet controller interrupt has to be set in a proper high priority level so that the request can be served in a proper way. Also typically it is not desirable to raise one interrupt for every packet. Most etherent controller supports interrupt coalesing, also interrupt throttling. For Intel GBE, there is a good [document](http://www.intel.com/content/dam/doc/application-note/gbe-controllers-interrupt-moderation-appl-note.pdf) for this.
  • During processing the packet, a few aspects to consider:
    • If the all processing can be done in the interrupt context, typically the throughtput will be higher than swiching to process context.
    • Every effort should be made to avoid packet copy, as this could turn into very expensive operation. Typically a data structure should be created from heap, and associated with the packet in the memory, the reference of data structure (named as *packet) should be passed around across the packet forwarding process, all the way to transmit ring.
    • During packet processing, it should avoid access the ethernet controller's registers as it is a very expensive operation. To do this, typically there are receive ring shadow and transmit ring shadow created, which are kept syncing to the actual receive ring and transmit ring.
  • When the packets are transmitted, the interrupt need to be raised to the running OS. Again, the same principle as receive interrupt applies.
  • In the transmit interrupt ISR, the memory should be free up and return to the pool for next round use.