Publications on Packet Buffering and Queueing(In Reverse Chronological Order)
(Also available as BIBTeX format)
Abstract: Backbone routers typically require large buffers to hold packets during congestion. A thumb rule is to provide a buffer at every link, equal to the product of the round trip time and the link capacity. This translates into Gigabytes of buffers operating at line rate at every link. Such a size and rate necessitates the use of SDRAM with bandwidth of, for example, 80 Gbps for link speed of 40 Gbps. With speedup in the switch fabrics used in most routers, the bandwidth requirement of the buffer increases further. While multiple SDRAM devices can be used in parallel to achieve high bandwidth and storage capacity, a wide logical data bus composed of these devices results in suboptimal performance for arbitrarily sized packets. An alternative is to divide the wide logical data bus into multiple logical channels and store packets into them independently. However, in such an organization, the cumulative pin count grows due to additional address buses which might offset the performance gained. We find that due to several existing memory technologies and their characteristics and with Internet traffic composed of particular sized packets, a judiciously architected data channel can greatly enhance the performance per pin. In this paper, we derive an expression for the effective memory bandwidth of a parallel channel packet buffer and show how it can be optimized for a given number of I/O pins available for interfacing to memory. We believe that our model can greatly aid packet buffer designers to achieve the best performance.
Abstract An extensible firewall has been implemented that performs packet filtering, content scanning, and per-flow queuing of Internet packets at Gigabit/second rates. The firewall uses layered protocol wrappers to parse the content of Internet data. Packet payloads are scanned for keywords using parallel regular expression matching circuits. Packet headers are compared to rules specified in Ternary Content Addressable Memories (TCAMs). Per-flow queuing is performed to mitigate the effect of Denial of Service attacks. All packet processing operations were implemented with reconfigurable hardware and fit within a single Xilinx Virtex XCV2000E Field Programmable Gate Array (FPGA). The singlechip firewall has been used to filter Internet SPAM and to guard against several types of network intrusion. Additional features were implemented in extensible hardware modules deployed using run-time reconfiguration.
Abstract Field Programmable Gate Arrays (FPGAs) are being used to provide fast Internet Protocol (IP) packet routing and advanced queuing in a highly scalable network switch. A new module, called the Field-programmable Port Extender (FPX), is being built to augment the Washington University Gigabit Switch (WUGS) with reprogrammable logic.
FPX modules reside at the edge of the WUGS switching fabric. Physically, the module is inserted between an optical line card and the WUGS gigabit switch backplane. The hardware used for this project allows ports of the switch populated with an FPX to operate at rates up to 2.4 Gigabits/second. The aggregate throughput of the system scales with the number of switch ports.
Logic on the FPX module is implemented with two FPGA devices. The first device is used to interface between the switch and the line card, while the second is used to prototype new networking functions and protocols. The logic on the second FPGA can be reprogrammed dynamically via control cells sent over the network.
Abstract: This paper proposes an innovative concept called virtual output queue to support Available Bit Rate (ABR) traffic on an input-buffered, per-virtual circuit queued switch. This technique allows ABR models developed for output-buffered systems to be migrated to an input-buffered system.
In order to evaluate the virtual output queue and to compare different ABR algorithms, a simulator of the ATM testbed at the University of Illinois has been enhanced with ABR functions. This paper provides simulation results for the input-buffered variation of the ERICA+ algorithm
Abstract: This paper presents the design and prototype of an intelligent, 3-Dimensional Queue (3DQ) system for high-performance, scalable, input buffered ATM switches. The 3DQ uses pointers and linked lists to organize ATM cells into multiple virtual queues according to priority, destination, and virtual connections, then selects proper cells for switching based on Quality-of-Service (QoS) parameters and run-time traffic conditions. Using Field-Programmable-Gate-Array (FPGA) devices, our prototype hardware can process ATM cells at 622 Mb/s (OC-12). Using more aggressive technology (Multi-Chip-Module (MCM) and fast GaAs logic), the same 3DQ design can process cells at 2.5 Gb/s (OC-48). Combined with the Matrix-Unit-Cell-Scheduler (MUCS) module, a high-performance input-buffered ATM switch system has been designed, which avoids Head-Of-Line (HOL) blocking and achieves near-100% link bandwidth utilization.
Abstract: This thesis presents the design and implementation of the multicast, input-buffered Asynchronous Transfer Mode (ATM) switch for use with the iPOINT testbed. The input-buffered architecture of this switch is optimal in terms of the memory bandwidth required for the implementation of an ATM queue module. The contention resolution algorithm used by the iPOINT switch supports atomic multicast, enabling the simultaneous delivery of ATM cells to multiple output ports without the need for recirculation buffers, duplication of cells in memory, or multiple clock cycles to transfer a cell from an input queue module.
The implementation of the prototype switch is unique in that it was entirely constructed using Field Programmable Gate Array (FPGA) technology. A fully functional, five-port, 800 Mbps ATM switch has been developed and currently serves as the high-speed, optically interconnected, local area network for a cluster of Sun SPARCstations and the gateway to the wide-area Blanca/XUNET gigabit testbed. Through the use of FPGA technology, new hardware-based switching algorithms and functionality can be implemented without the need to modify hard-wired logic. Further, through the use of the remote switch manager, switch controller, and FPGA controller, the management, operation, and even logic functionality of the iPOINT testbed can be dynamically altered, all without the need for physical access to the iPOINT hardware.
Based on the existing prototype switch, the design of the FPGA-based, gigabit-per-second ``Any-Queue'' module is presented. For this design in its maximum configuration, up to 256 queue modules can be supported, providing an aggregate throughput of 180 Gbps. Further, the design of a 16-port, 11.2 Gbps aggregate throughput, switch fabric is documented that can be entirely implemented using only eight FPGA devices.
In addition to the design of the switch module, this thesis describes the supporting components of the iPOINT testbed, including the network control and application software, the hardware specifications of the switch interface, and the device requirements of the optoelectronic components used in the testbed. VHDL and schematics of the switch hardware and C/C++ source code for the supporting systems are included.