Sunday, November 27, 2011

CEF and load sharing

Load-sharing is one of the clumsy areas that is full of confusing parts. In this post we should be covering its ABCs, and latter on we should be covering more parts in details. We chose the name “CEF and load sharing” as the post name due to the main role that CEF plays when talking about load sharing.

In IP routing context the forwarding/switching mechanism that the router uses is the actual controller of the load sharing process (data/forwarding plane operation), having multiple routes in the routing table has no significance on how exactly will load sharing be done, you might be left with poor load sharing or no load sharing at all, although you have multiple routes for a certain destination in the routing table.

The routing protocols are responsible for placing multiple paths in the routing table in the first place (control plane operation), by default all the IGPs are capable of inserting 4 equal cost paths, while BGP defaults to only 1 (BGP behaves completely different than the IGPs, we should be covering load-sharing with BGP in details in a later post). To control the maximum paths allowed per routing protocol we can use the maximum-paths command (The maximum was 4 in IOS releases earlier than 11.0, 8 with IOS Release 12.0S based software, 16 with IOS Release 12.3T based software, and 32 with IOS Release 12.2S based software.

NOTE This post is not meant to explain CEF operation, we’ll only be focusing on CEF load-sharing, however we might consider to have a dedicated CEF inside out post later.

The most popular forwarding/switching mechanisms with Cisco routers are; Process switching (performs per-packet load-sharing), fast switching (performs per-destination load-sharing) and CEF (can do both per-packet and per-destination (completely different than fast switching per-destination load-sharing), plus also a new flavor which is per-port load-sharing).

NOTE According to Cisco, IPv4 fast switching is removed with the implementation of the Cisco Express Forwarding infrastructure enhancements for Cisco IOS 12.2(25)S-based releases and Cisco IOS Release 12.4(20)T. For these and later Cisco IOS releases, switching path are Cisco Express Forwarding switched or process switched. This makes the switching decision easier for future development of software features. Starting with the implementation of the Cisco Express Forwarding enhancements and the removal of IPv4 fast switching, components that do not support Cisco Express Forwarding will work only in process switched mode.

Load-sharing with CEF

For each destination with multiple equal cost paths (or unequal-cost in the case of EIGRP using variance, or with BGP using the BGP Link Bandwidth feature and also in the case of MPLS-TE) the router creates a 16 hash buckets, each pointing to one of the available paths.

The load sharing is controlled by the ratio of the number of buckets pointing to each path (outgoing interface), with equal-cost paths the buckets are fairly distributed (two equal cost paths results in 8 buckets per each path, three equal cost paths results in 5 per each (yes, one bucket is omitted), 4 equal cost paths results in 4 per each, and so on). While with unequal-cost scenarios each path will be associated with different number of buckets (according to the load sharing ratio).

CEF has three load-sharing options:
•per-destination (per-session):

I prefer to name it per-session – as stated in the show ip cef x.x.x.x internal command output – since it is actually done based on both the source and the destination IP addresses in the IP packet rather than solely the destination, by hashing both into a 4-bit hash value that is used to select the outgoing interface) – This is the default CEF load sharing option.

Easy AdSense by Unreal

It is clear that per-destination load-sharing performs statistical distribution of traffic, and accordingly load sharing becomes more effective as the number of source/destination pairs increases as compared to lower number of source/destination pairs. Obviously this might result in having one link overloaded while the other(s) underutilized, if a relatively heavy session flows between a certain source/destination pair over this link.

The hash calculation depends on the algorithm used. The original algorithm uses only the source and destination IP addresses to compute a 4-bit hash value, giving 16 probabilities, and thus choosing an outgoing bucket from the 16 available buckets pointing to one of the outgoing paths, this results in all the routers in the network running the same algorithm with the same results, which introduced a load sharing hitch called CEF Load-Sharing Polarization (you can see a good example for this in Cisco press book “Cisco Express Forwarding”). To circumvent this behavior the universal algorithm (the default in current IOS versions) adds a 32-bit router-specific value to the hash function (called Fixed ID, which can be manually controlled – a router uses its highest loopback IP address as this value when booting) and thus seeding the hash function on each router with a unique ID, ensuring that the same source/destination pair will hash into a different 4-bit value on different routers along the path and thus provides a better network wide load sharing and circumvent the Polarization issue.

NOTE There is a third available algorithm called the tunnel algorithm, I couldn’t find or understand its anatomy, but Cisco stated that this algorithm is meant to solve load sharing when tunneling techniques such as MPLS, GRE and L2TP are in operation, since with tunneling the traffic pattern is taken down to a small number of sessions (between the tunnel head/tail ends) which will introduce another form of traffic polarization. This algorithm also uses a unique per-router ID to work around this issue, again I can’t find more details about this algorithm, but if I do I’ll let you know.

Packets are handled in a round-robin fashion, ensuring that the traffic is balanced over multiple links. However, using Per-packet load sharing is not generally recommended, because it most commonly results in out-of-order packets, affecting TCP traffic throughput (since TCP will bother to fix the out-of-order) and UDP data loss (since UDP will not bother to fix the out-of-order) and to make things more scary out-of-order packets might be interpreted as an attack by firewalls.

The default CEF load sharing mode is per-destination, and we can change this using the ip load-sharing per-packet interface command on the outgoing interfaces involved.

NOTE Since load sharing decisions are made on the outbound interfaces, thus either choosing to do per-packet or per-destination load sharing should be done on the outbound interfaces.
•per-port (per-flow)

This is the most adequate option (was introduced with IOS 12.4(11)T release) with networks with low number of sources/destinations with the majority of the traffic between hosts that use different port numbers, commonly seen with Real-Time Protocol (RTP) streams, it simply adds the layer 4 source or destination ports or both in the CEF hashing function. This option is enabled via the ip cef load-sharing algorithm include-ports command in the global configuration.

The most common scenario with this option as the only effective solution is when having a subnet of hosts NATed to a single IP then having a router with multiple paths in the path to their traffic destination, per-destination option is obviously useless in this case if all the hosts are communicating with a single destination, since it is always a single source/destination pair, and accordingly if the layer 4 ports are involved in the hashing function this would enhance the load sharing process.

I hope that I’ve been informative.

Mirza Mukaram Baig

ARP Caching and TimeOut

From time to time I find myself craving to the fundamentals; I do this for two main reasons, the first one is that fundamentals are the building blocks of all complex networking topics and deeply understanding them makes a better engineer, the second one is longing to simplicity after doing some complex tasks.

One of these fundamentals that is worth reviewing is the Address Resolution Protocol, this protocol is one of the main building blocks of any network existing on earth today.

Every time a network device is sending an Ethernet frame to another device, it constructs a frame and to construct the frame it needs to find the hardware address mapping of the IP address. ARP is responsible for doing this job.

Each time a device sends an ARP message, network resources are consumed. This means that for two hosts to communicate; ARP messages should be exchanged between them and repeated for every packet. Imagine how ugly is this when transferring large data streams like large file exchange via FTP.

ARP caching provides the solution for this efficiency problem as explained below.

ARP Caching

If you know you are going to send many emails to a friend; is it effective to call him every time asking for his email address?. I think the answer is no unless you are fascinated by listening to his voice. Simply you call him one time asking for the address and cache the information somewhere for future uses and that’s exactly what ARP does.

When a host sends an ARP request to another host and a reply is received the sender caches the received information is a table for later use.

Easy AdSense by Unreal

Going back to our analogy of the email sender, what if you know that you are not going to send any more emails to your friend “God keep you friends ” Is it still effective to keep his address in your cache table ?. I think not, you have to timeout unused information. Again this is exactly what ARP does.

If an ARP entry is not used a specific amount of time called the ARP timeout the entry is removed from the caching table.

There is no standard value for this amount of time and it varies from one vendor to another. I will limit my discussion to Cisco devices to clear up the idea.

One more point to mention here is that entries in the ARP table can be static; created by manual configuration or dynamic; created automatically by the normal operation of the protocol. Static entries remain in the table forever and are not timed out.

The default timeout timer for is 4 hours for Cisco devices, this means that a dynamic ARP entry will remain for 4 hours in the cache table before the router attempt to refresh the entry. If the entry is no longer needed it will be removed.

You can show the ARP table using the command show arp and change the timeout timer for a specific interface using the interface level command arp timeout seconds.

R1#show ip arp
Protocol Address Age (min) Hardware Addr Type Interface
Internet - sa00.0a11.0001 ARPA FastEthernet0/0
Internet 97 sa02.0a11.0002 ARPA FastEthernet0/0
Internet 8 sa00.0a11.0003 ARPA FastEthernet0/5
Internet 136 sa04.0a11.0004 ARPA FastEthernet0/2

!-- setting the timeout for 10 seconds
R1(config-if)#int f0/2
R1(config-if)#arp timeout 15

!-- see the debug output, shows 15 seconds difference between replies
Jan 1 00:01:14: IP ARP: sent req src sa00.0a74.0005,
dst ca02.0a74.0008 FastEthernet0/0
Jan 1 00:01:14: IP ARP: arp_process_request:, hw: sa02.0a74.0008; rc: 3
Jan 1 00:01:14: IP ARP: rcvd rep src sa02.0a74.0008, dst FastEthernet0/0
Jan 1 00:01:14: IP ARP: creating entry for IP address:, hw: sa02.0a74.0008
Jan 1 00:01:24: IP ARP: sent req src ca00.0a74.0008,
dst ca02.0a74.0008 FastEthernet0/0
Jan 1 00:01:24: IP ARP: arp_process_request:, hw: ca02.0a74.0008; rc: 3
Jan 1 00:01:24: IP ARP: rcvd rep src ca02.0a74.0008, dst FastEthernet0/0
Jan 1 00:01:24: IP ARP: creating entry for IP address:, hw: ca02.0a74.0008
Note: ARP cache table is not the same as MAC address table used by switches and each one has its own different timers.

Thank you once again.