Providing Guaranteed Rate Services in the Load Balanced Birkhoff-von Neumann Switches
Cheng-Shang Chang, Duan-Shin Lee, Chi-Yao Yue, Proceedings of IEEE Infocom 2003.
Summary:
Load balanced Birkhoff-von Neumann switches with multiple stages of buffering are designed to switch data with very little complexity and achieve 100 % throughput under most traffic conditions. They use two stages of switching using crossbars with buffering before and after each crossbar. The first stage of switching performs load balancing of the traffic so that the traffic to any output is balanced at the inputs to the second switching stage. The second switching stage performs the Birkhoff-von Neumann switching, which in this case of load balanced traffic, requires the use of simple periodic connection patterns as in the first stage. Thus, the switching operation is a very simple process and does not require too much memory. This is very important in high speed switches because the time available to switch packets is very low (40 ns for 50 byte packets on a 10 Gbps link). This switch also supports multicast and variable length packets (by using resequencing buffers at the output stage).
This paper proposes two schemes for providing rate guaranteed services in Load Balanced Birkhoff-von Neumann switches, namely Earliest Deadline First (EDF) and a frame based scheme. In the EDF scheme, each packet is assigned a departure time that is the departure time from the corresponding FCFS work conserving link with capacity equal to the guaranteed rate of the flow. A jitter control stage is implemented after the load balancing stage to "retime" the packets. Finally, there is a resequencing and output buffer stage to send the packets out in the desired schedule. It has been shown that each packets leaves the switch at the targeted departure time or deadline plus a constant that depends on the number of slows and the size of the switch. Also, the size of the resequencing buffer is bounded.
In the EDF scheme, the resequencing buffer and the jitter control mechanism are significantly complex to implement. The authors then propose a frame based scheme that is simple and has finite bounds for the end-to-end packet delay and the size of the buffers if the traffic satisfies certain assumptions that is not complex to implement (O(1) online complexity).
Comments:
1) Load balanced Birkhoff-von Neumann switching is one of the first switching solutions that moves away from the bipartite weighted matching formulation for the scheduling problem in input-queued switches and gives good performance. This method does not try to schedule one packet from each input at a time and hence, avoids the processing complexity associated with that. When the traffic satisfies certain criteria, the switching scheme is very simple to implement and hence, can scale to very high link speeds.
2) However, the assumption that the traffic satisfies the "doubly stochastic" rate matrix condition is unreasonable for IP routers. TCP, by design, tries to overload links until it detects an overload condition and backs off bringing down the load on the links and very often the load fluctuates wildly and certain output links can be overloaded for extended periods of time. Thus, to make these switches practical for The Internet, it is important to study how they perform in the worst case as compared to other switching methods.
3) To implement the frame-based scheme to provide guaranteed rate services, the authors group time slots into fixed sized frames. The size of the frames is defined as the first value that makes the product of r(i,j) and F an integer, where r(i,j) is the rate of flow i at input j and F is the frame size. Thus a small rate can lead to a very large frame size. If the frame size is too large, it becomes necessary to restrict the frequency of the number of changes that can be made to the rate matrix (eg., by adding new flows or removing existing flows). Further, this probably implies that the rates need to be multiples of a minimum value, to avoid changing the frame size for every new connection. Also, the delay through the switch is directly proportional to the frame size and thus, a very large frame size might lead to a proportionally large delay through the switch. The amount of memory required also is proportional to the frame size. The paper does not address the impact of the frame sizes and equivalently, the granularity of the rates of connections on the performance of the switch.
Let us assume that the link rate is 10 Gbps and cell size is 50 bytes. A time slot of operation is 40 ns. Suppose we want to provision connections of bandwidth 1 Mbps, this corresponds to a rate of 1/10000. Thus, the maximum frame size = 10000. This implies that the switch configuration can be changed only every 10000 time slots = 0.4 ms (might be more than this value) for correct operation. Also, the delay through the switch is bounded by approximately 2NF, where N is the number of input ports. Assuming a switch of capacity of 1 Tbps, N = 100. Thus, the delay through the switch is 2*100*10000 time slots = 40 ms. Also, the amount of central buffers required by this scheme is given by 2N2F. This gives us total buffer of 4 sec or 40 Gigabits or 5 Gigabytes per input port!
Critique by Jai Ramamirtham