Network Diversification
a strategy for de-ossifying the internet
(under construction)

The Internet is one of the great technology success stories of the twentieth century. It has enabled greater access to information, provided new modes of communication among people and organizations and has fundamentally changed the way we work, play and learn. Unfortunately, the Internet's very success is now creating obstacles to innovation in the networking technology that lies at its core and the services that use it. The size and scope of the public Internet now make the introduction and deployment of new network technologies and advanced services difficult. While the research community has developed innovative solutions to a wide range of networking challenges, there has been remarkably little progress towards deploying these capabilities in the Internet at large. Even the deployment of relatively modest changes of widely acknowledged importance, such as IPv6, have proved quite difficult. The current Internet architecture and the business relationships that have developed among the various stakeholders have become a serious obstacle to its continuing evolution and growth.

Ossification is a natural evolutionary stage in the development of any highly successful technology. Success creates constituencies with a stake in the status quo, and this in turn creates inertia that inhibits change. However, the problem is more acute in the context of network technologies because, network technologies are shielded from effective competition by the deployment obstacles raised by the high cost of infrastructure and the need for agreement among a large collection of organizations with often competing interests. If we are to free the global communications infrastructure from stagnation, we must find a way to enable new technologies to be deployed and used, at least on an experimental basis. Deployment must be carried out on a large enough scale to demonstrate the utility of new technologies to a broad audience and enable meaningful evaluation.

Network diversification provides a potential strategy for addressing the ossification of the Internet. In a diversified network, multiple metanetworks co-exist on top of a shared substrate. Different metanetworks provide alternate end-to-end packet delivery systems and may use different protocols and packet formats. Diversified overlay networks have already become an important tool for the research community [PlanetLab], but diversification also has the potential to become a first-class feature of the core network. The emergence of high performance network processors and advances in configurable logic device now make it feasible to build diversified routers that can match the performance of conventional routers, while allowing far greater flexiblity. If diversification is applied to the Internet at large, it will make it possible for new network technologies to be deployed alongside incumbent technologies, giving them the opportunity to succeed (or fail) on their own merits. We believe that this will stimulate innovation in both core network protocols and advanced services that combine computing and communication in creative new ways. Moreover, it can eliminate, once and for all, the problem of network ossification that seems to inevitably accompany success and growth.

This project seeks to develop both the organizing principles for a diversified networking system and the underlying technologies needed to make them a practical reality. (A note about terminology. We have adopted the terminology diversified networking instead of virtual networking, because we have found that the "V-word" has become so overloaded that it often leads to confusion.)

Basic Concepts

Metanetworks are implemented by metarouters, connected by metalinks.

Diversified Router Architecture

Resource Provisioning

Getting from Here to There


This work is supported by the National Science Foundation. However, any opinions, findings and conclusions or recomendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

The purpose of this project is to develop a diversified router platform to enable networking researchers to more easily construct experimental networks that explore new and potentially disruptive architectural innovations. Network diversification is an emerging concept that envisions networks built around a reconfigurable substrate capable of hosting multiple network interlays. Each interlay can be thought of as a conceptually separate network, using its own protocols and providing its own distinct set of network services. The substrate provides the physical resources needed by the various interlays, through an automated resource provisioning subsystem. The proposed diversified router platform will host multiple interlay routers, using a flexible architecture built around a pool of processing engines that can be allocated to different interlay routers as needed. It will enable researchers to create systems that demonstrate new network architectural innovations in a form that allows them to be deployed and used to carry live traffic, even in settings that demand high levels of performance.

The proposed platform is designed to leverage the Advanced Telecommunications Computing Architecture (ATCA), a rapidly developing industry standard for telecommunications equipment. This will allow the use of standard equipment chasses, backplanes and power subsystems, eliminating the need to develop these components. It also makes it possible to purchase board-level components for some parts of the overall system. The proposed router platform will use a standard 14 slot ATCA chassis and will use distributed switching to simplify the overall system architecture. It will support link speeds up to 10 Gb/s with typical configurations supporting 4 to 6 such links and 8 to 10 processing engines of various types (the mix of line cards and processing engines is entirely flexible). While we expect single chassis configurations to be sufficient for most research uses, the platform will also provide optional inter-chassis links that will allow configuration of larger systems.

The system components making up the proposed platform will be developed in collaboration with an industry partner that will manufacture the boards and make them available for purchase by other researchers. This will allow the systems developed through this effort to be made widely available and used by researchers in a variety of different settings. We also plan to make the systems in our lab available for experimental use by others, through Washington University’s Open Network Lab.

2. Research Activities

The proposed diversified router platform can be used to support a wide range of networking research activities. The flexibility provided by its processing engines makes it ideal for exploring new network architectures or extensions to the existing internet architecture that require modification to the router data path. It will support implementation approaches that are directly comparable to those used in the networking industry (including configurable hardware and network processors), allowing researchers to demonstrate their ideas in a form that facilitates their transfer into commercial products. This section focuses on how the platform will enable new research initiatives within our group at Washington University. In section 4, we also discuss a developing opportunity for using this platform in a much larger context.

2.1. Diversified Networking

The concept of diversified networking is motivated by the observation that the Internet has fallen victim to its own stunning success.  The interplay of the end-to-end design of IP and the vested interests of competing stakeholders has led to its growing ossification.  Alterations to the Internet architecture that address its fundamental deficiencies or enable new services have been restricted to incremental changes.  The slow pace of this process stifles innovation and the adoption of disruptive technology.  A recent call to arms advances a research agenda to confront this impasse through virtualization [SH04] (here, we use the term diversification in place of virtualization, because we have found that virtualization has become so overloaded that its use in this context is often misunderstood).  Our research explores the potential of diversification as a central organizing principle for the Internet itself, rather than just a means to evaluate architectural alternatives. We argue that a diversified Internet would enable new network innovations to be rapidly and widely deployed. While this could lead to fragmentation of the communications infrastructure, this is not necessarily a serious problem in a diversified network environment. It is also likely that in a diversified Internet, the incumbent IP technology would advance much more quickly in order to stay competitive. This would greatly limit the tendency towards fragmentation. This effort is part of a growing body of work that seeks to enable the deployment of new network technologies using a more flexible underlying infrastructure [CH01, CH03, PE02, TO01, WA02, WH02].

Text Box:  
Figure 1. Diversified network concept
Figure 1 illustrates the concept of diversified networking. An underlying network (called the substrate), consisting of end-systems, communication links and routers is shared by multiple interlay networks, comprising interlay protocol stacks in hosts, interlay links and interlay routers. Interlay networks enable fundamentally different end-to-end packet delivery mechanisms by allowing each interlay to define its own packet formats, addressing methods, packet forwarding, routing protocols, etc.  In this model, the “narrow waist” of the Internet architecture is a thin resource provisioning layer that provides automated mechanisms for allocating the substrate’s resources to different interlays.  The role of the provisioning layer is to enable interlay providers to automatically deploy, configure and operate interlay networks. By limiting the substrate to this resource provisioning role, the architecture provides tremendous flexibility for designers of interlay networks. (While our focus is on packet-oriented interlay networks, it should be noted that the diversified network model can also support circuit-like services on interlay networks, since the substrate can provide provisioned bandwidth for interlay links, allowing an interlay to emulate circuits.)

To turn the vision of a diversified Internet into reality, the research community will need to address a variety of technical challenges. One of the primary challenges centers on the design of the protocols and associated mechanisms needed to enable the automated provisioning of diverse interlays in a global, multi-domain substrate network. The problem to be solved is very different from that addressed by conventional network resource reservation mechanisms. Some of these differences make it easier, others make it more complex. Unlike flow-based reservation where end-to-end resources must be reserved for individual communication sessions, the interlay provisioning system will allocate resources on a relatively coarse-grained and long-term basis. This is because interlay networks will be set up and operated to serve large numbers of users and resources will be reserved to handle the aggregate traffic from these many users over an extended period of time. These differences dramatically reduce the scaling and performance challenges associated with resource provisioning, relative to resource reservation in conventional networks. One exception to this occurs at the access links that connect a “first-hop” substrate router to end hosts. In this context, the level of aggregation is more limited, making static provisioning of interlay link bandwidths unattractive. However, dynamic adjustment of the first hop interlay link bandwidth can be done, through a local negotiation between the end host and the first hop router. On the other hand, the nature of the interaction between interlay networks and a multi-domain substrate is intrinsically more complex than the relationship between a single user and a network (even a multi-domain network) providing a reserved bandwidth flow, as interlay networks must be able get information about capabilities available through different Substrate Network Providers (SNP) and evaluate their options, in order to determine which SNPs can best serve their needs. They must then obtain the required resources from multiple SNPs and combine them to create their interlay.

A second challenge concerns how to ensure that different interlay networks can co-exist on a common substrate without interference. While there are well-understood mechanisms for multiplexing different virtual links onto a single physical link so as to isolate the different traffic streams, there is no such direct precedent for substrate routers that can be shared by a variety of interlay routers. While virtual machines in computer systems provide a suitable conceptual framework for thinking about these issues, the specific mechanisms used by virtual machines have limited applicability, since they do not address performance isolation and are not directly applicable to systems comprising a large number of parallel subsystems.

In a diversified router, there are two types of resources, in addition to link resources, that must be allocated to interlay routers and managed in a way that allows each interlay router to operate without interference: processing engine resources and switching resources. Processing Engines (PE) can come in a variety of forms. They may be built around general purpose processors, network processors, digital signal processors or configurable logic chips. A given diversified router may support multiple types of PEs to serve in different roles (packet forwarding, format conversion, per-flow signaling and control). Some interlay routers may require a single PE, while others may require multiple PEs (either for performance reasons or to provide different types of processing). It’s also possible for an interlay to require just a fraction of a PE. For PEs built around a general purpose processor this is straightforward to do, using virtual machine concepts, but it is more difficult for PEs built around network processors or configurable logic chips.

Managing switching resources is relatively straightforward when the interlay routers are built around a single PE or a fraction of a PE. In this case, switching resources are needed only to connect PEs to interlay links. When interlay routers use multiple PEs and switching resources are used to forward data from PE to PE, the situation becomes more complex, since PEs must be given the freedom to use their switching resources to communicate with any of the PEs within the same interlay router (that is, there are no pairwise restrictions on the use of communication bandwidth between PEs). The substrate must provide each interlay the flexibility to use its switching resources flexibly, while simultaneously isolating each interlay from the effects of traffic variation within another interlay. This cannot be done by simply rate limiting individual PE transmissions. It requires coordination of the transmissions by different PEs into the interlay switch. There are known solutions to solve this problem in conventional routers [PA03, PA04], but generalizing those methods to apply to diversified routers will require new research. A flexible router platform that can be configured to implement different algorithms for managing this coordination is essential for evaluating their performance in a realistic system context. The proposed diversified router platform uses configurable logic chips to implement the substrate and switching functions, allowing experimental evaluation of a wide range of different solutions to this problem.

2.2. High Performance Data Transfer Services

High performance computing for large-scale scientific applications has played a central role in stimulating the development of computing technology for more than 40 years. The growing demands of e-science are no longer being adequately met by conventional internet technology. Diversified networking introduces the possibility of creating and deploying specialized networks for e-science that operate over the same physical infrastructure as networks providing commodity internet service, while simultaneously providing new services that are specifically tailored to the needs of scientists. Here, we describe two such services that we propose to implement and evaluate using the systems developed through this proposal, as demonstrations of the potential benefits of this technology for e-science. The first service is a fast burst reservation service for delivering up to a gigabyte of data fast enough for use in interactive applications. The second is a bulk data transfer service designed to move very large datasets (from tens of gigabytes to petabytes), using advance scheduling and taking advantage of multiple network paths.

Fast Burst Routing. Scientific applications increasingly involve interactive exploration of large, multi-dimensional data sets. This requires rapid delivery of bursts with up to a gigabyte of data at speeds that are fast enough for interactive use. Conventional internet technology is poorly suited to handling such applications, because routing is oblivious to the bandwidth an application requires and because the congestion control mechanisms are reactive, ramp up slowly for flows with large window sizes and depend on large buffers at intermediate routers, adding to delay. We propose a fast burst routing mechanism that is implemented in hardware, can select among alternate paths based on available bandwidth and bandwidth needs, can reserve bandwidth and per-flow queues at routers along the path and provide explicit rate feedback to a sender [KU02].  The data forwarding part of the protocol is simple enough to be implemented entirely in hardware (including the selection of routes), allowing wire-speed forwarding of gigabyte bursts. Explicit rate feedback allows a sender to increase its rate to match the available bandwidth in one network round-trip time and the use of per-flow queues with reserved bandwidth can keep queueing delays very small.

Text Box:  
Figure 2. High Level System Organization
Bulk Data Delivery. The proposed bulk data delivery service is designed to facilitate the transfer of very large data volumes (from hundreds of gigabytes to petabytes). Note that a 10 TB transfer takes more than two hours, even at a 10 Gb/s transfer rate. What is important here is that transfers occur in a predictable fashion and use the available network resources in the most efficient way possible, not that they occur in a few seconds. For this reason, we adopt a “parcel delivery” model, in which a user requests a data transfer in advance, specifying the amount of data to be transferred, a time when it will be available for pickup and a deadline for its delivery to one or more specified destinations. The network pre-schedules the transfer, letting the user know immediately if it can meet the specified deadline, and offering an alternate schedule, if it cannot. The ability to deliver to multiple destinations is important in applications where the same dataset must be delivered to scientists at multiple institutions, and it’s important to avoid the inefficiency of multiple unicast transfers, which can tie up the sending site’s communication resources for long periods of time. The key to the bulk delivery model is the advance scheduling mechanism which uses global knowledge of the network topology and previously scheduled transfers to construct an efficient schedule, exploiting all available paths through the network infrastructure. In a diversified network context, such a system may also be able to obtain additional resources from the underlying network substrate during periods of unusually high demand.

3. Description of Research Instrumentation and Needs

The project will develop a diversified router platform to enable networking researchers to more easily construct experimental networks that explore new and potentially disruptive architectural innovations. It will enable researchers to create systems that demonstrate new network architectural innovations in a form that allows them to be deployed and used to carry live traffic, even in settings that demand high levels of performance. The system is designed to take advantage of the Advanced Telecommunications Computing Architecture (ATCA) standards, which will allow it to leverage certain standard components, leading to a more robust platform. While we expect single chassis systems to be sufficient for most applications, the design includes optional inter-chassis communications mechanisms to enable the construction of larger systems.

3.1. Design Concept and Rationale

The high level system architecture is shown in Figure 2. It consists of a set of Processing Engines (PE) that are connected to each other and to Line Cards (LC) through a switching subsystem. We plan to develop several types of PEs, which will differ in performance and ease of use. The LCs connect to external links and support multiplexing of interlay links onto a single substrate link. They will be implemented using FPGAs, allowing them to be easily configured to add additional headers to selected substrate links to enable tunneling of substrate links across other lower layer networks. The separation of PEs from LCs provides maximum flexibility in the allocation of PE resources to different interlay routers.

3.1.1. Text Box:  
Figure 3. Selected ATCA Components
Advanced Telecommunications Computing Architecture

The ATCA standards have been developed to facilitate the construction of carrier-class communications systems that incorporate subsystems supplied by multiple vendors. It defines standard physical components and some standard patterns for how to use those components to construct high performance systems. It has attracted broad industry support and is expected to lead to the development of a range of inter-operable subsystems that will allow more cost effective and flexible development of new communication systems.

Figure 3 shows a standard 14 slot ATCA chassis with backplane, power distribution system and cooling fans. Such chasses are now available from several vendors. The backplane includes standard signals for clock distribution and low level system management. It also defines fabric connections that implement several interconnection topologies for high speed inter-board communication (differential signal pairs suitable for 2.5 Gb/s data rates). ATCA standardizes key aspects of the boards that are used with the chassis, including physical size, connector type and placement and the use of certain of the connector signals. It also defines standards for mezzanine cards that can be optionally used with an ATCA base card. In addition, it defines standards for optional Rear Transition Modlules (RTM) which are small cards that are inserted into the back side of the chassis and are typically used for interconnecting multiple chasses in larger systems. These elements of the standard are also shown in Figure 3.

One attraction of the ATCA standard is the expectation that it will lead to the development of standard components that can be used directly in different systems or adapted for different purposes. We plan to take advantage of such opportunities where possible, but expect to have to develop several key hardware components as part of this project. Figure 4 shows a development board that has been created by Xilinx and is now available from Avnet [XLNX]. The board includes a Virtex 2 Pro FPGA which terminates 16 high speed backplane signals providing a total bandwidth of 40 Gb/s (in each direction). The board is supplied with complete schematics and design notes to allow developers to create systems that leverage certain design elements of the development board. We plan to adapt this board to create the Carrier Board that is described below.

Figure 4 also shows an ATCA mezzanine card from Artesyn that provides a processor subsystem with a 1.8 GHz Pentium M processor with up to 2 GB of SDRAM and 2 MB of L2 cache. An ATCA base board can host up to four such mezzanine cards. Although this card does not include a disk, mass storage cards in the same format are expected to become available in the near future, allowing configuration of systems hosting large numbers of complete general purpose processing subsystems.

3.1.2. Text Box:              
Figure 4. 	Sample ATCA Components: Xilinx/Avnet Development Card and Artesyn Pentium Processor Module
Card Types

To provide the flexibility to support a range of research activities, we plan to implement three different types of Processing Engines (PE). The first will implement a Network Processor (NP) subsystem and is referred to as the PE/NP card. It is illustrated in Figure 5. The NP subsystem includes an Intel IXP 2850 network processor, with three banks of RDRAM, three banks of QDR SRAM and a TCAM-based search engine for packet classification and other applications requiring high throughput associative lookup. The IXP 2850 has 17 on-chip processors, including an Xscale processor used for system management and 16 simple, but high performance 32 bit RISC processors that deliver high volume data processing. We plan to implement the PE/NP as a double-wide mezzanine card (such configurations are supported in the ATCA standard) that will plug into a Carrier Card (CC) that includes power conversion and low level system management components plus a Substrate component implemented using a Xilinx V2Pro/P100 FPGA. This device has two on-chip Power PC processors subsystems, one of which will be used to configure the hardware to support different interlay routers. The V2Pro/P100 also provides 20 high speed IO channels (2.5 Gb/s each), over a megabyte of on-chip SRAM and approximately 88,000 basic logic cells, each of which includes a 4 input logic function generator and a flip flop. The substrate chip terminates a SPI4 interface and a PCI interface from the PE/NP card. It also provides high speed IO connections to the backplane and to an optional Interchassis Link Card (ILC), which is implemented as a Rear Transition Module (RTM).

Figure 6 shows the second and third types of PEs that are planned. The first is based on a configurable logic chip and is referred to as the PE/CL. Like the PE/NP, it is implemented as a double-wide mezzanine card hosted on the same carrier card. The PE/CL includes a Xilinx V2Pro/P100 FPGA with two 400 MHz DDR DRAMs, three 400 MHz by 36 wide QDR SRAMs and a TCAM. This provides the resources needed to implement high performance packet processing. The reconfigurability of the FPGA allows the same card to implement a wide range of different types of processing, making it suitable for implementing different interlay routers. Note that while the physical interface between the substrate and the PE/CL card is the same as for the PE/NP card, the physical connections are used to implement different signals. This flexibility is made possible by the fact that the substrate is an FPGA that can be configured differently for different mezzanine cards. In the case of the PE/CL, we plan to implement a 10 Gb/s data channel using a 16 bit wide bus operating at 625 MHz. The substrate will also be connected to the SelectMap interface of the P100 on the PE/CL, allowing the substrate to download new configurable logic files to the PE/CL to configure it for different purposes. This in turn makes it possible for a remote configuration processor to send new configurable logic files to the substrate, so that it can configure the PE/CL for different interlay routers.

Text Box:  
Figure 5. Network Processor based Processing Engine
We also plan a third type of PE that hosts a general purpose processor card, as shown in Figure 6. The PE/GP will have two single-wide mezzanine slots, each of which is capable of supporting a general purpose processor card, or optionally a storage card. Such a card is useful for implementing higher level system management functions and is attractive for researchers who prefer to implement their research ideas in the more familiar development environment provided by a general purpose processor running a standard operating system, such as Linux. We expect to be able to purchase the mezzanine card in this case, rather than develop it ourselves.

3.1.3. Line Cards

In addition to the PE cards, we plan to develop a 10 Gb/s Line Card. This will be implemented as another double-wide mezzanine card containing a 10 GE MAC chip and the associated physical interface components.  It will connect to the substrate via a SPI4 interface. We have chosen not to implement a line card supporting multiple 1 Gb/s interfaces. While such a line card would be useful, there are now reasonably priced gigabit Ethernet switches with 10 Gb/s uplinks starting to emerge. So, the ability to connect to large numbers of 1 Gb/s Ethernet interfaces can be obtained without the added cost of developing a separate line card.

3.1.4. Distributed Switching

The proposed system design is based on a distributed switching concept that eliminates the need for separate switch cards, reducing development effort and reserving more of the available chassis space for processing engines and line cards. The 14 slots of the ATCA chassis will be divided into two subsets, referred to as odd and even. The substrate chip in each odd slot will have a 5 Gb/s link connecting it to the substrate chip in each even slot. Thus, each substrate chip will terminate seven of these links, which will each be implemented using a pair of the 2.5 Gb/s serial links available on the substrate chip. So the backplane connections will use 14 of the 20 gigabit serial links present on the substrate, leaving six links to connect the substrate to the RTM that provides inter-chassis communication.

The odd-even connection topology allows direct data transfer between odd slots and even slots. Communication between pairs of odd slots (or pairs of even slots) takes place in two hops. Since most communication will take place between line cards and processing engines, we can arrange for most data transfers to be single hop by placing line cards in even slots and processing engines in odd slots. This need not be a strict division, but does enable the most efficient use of the available bandwidth. Note that line cards can originate and terminate 10 Gb/s of traffic while the substrate has 35 Gb/s of backplane bandwidth available to it. We also expect processing engines to handle no more than about 10 Gb/s of traffic, although this will vary depending on the functions a PE is required to perform in a given interlay router. Consequently, we expect the bandwidth available for communication within the chassis to be more than adequate.

While we expect that for most applications a single chassis system will be sufficient, we have designed the system to support optional inter-chassis connections implemented with Rear Transition Modules. In a multi-chassis system, each slot will include an Inter-chassis Line Card (ILC) with six gigabit serial links (providing 15 Gb/s of bandwidth) connecting it to the substrate chip on the Carrier Card. These six gigabit serial links will connect to an FPGA on the ILC, which also connects to a 12 channel Parallel Optical Receiver and a 12 channel Parallel Optical Transmitter component. These components are available from several manufacturers and use VCSEL arrays to send parallel data streams over optical ribbon cables. Thus, each cable can provide 30 Gb/s of inter-chassis communication bandwidth, for each slot that is equipped with an ILC.

Text Box:  
Figure 6. FPGA and General Purpose Processor based Processing Engines
 Our initial plan is to implement inter-chassis communication using the multi-ring configuration illustrated in Figure 7. This makes the interconnection particularly simple and makes routing straightforward. For systems with up to four chasses, the inter-chassis bandwidth available on such a multi-ring configuration is sufficient for worst-case traffic conditions and ring configurations with up to eight chasses are feasible. It’s worth noting that the physical components provided for inter-chassis communication can be used to implement a variety of other communication schemes as well, by exploiting the reconfigurability of the FPGA on the ILC. For example, a 15 chassis system can be implemented with a single direct connection between each pair of chasses. In such a configuration the vast majority of inter-chassis traffic can be handled with a single inter-chassis hop.

3.2. Text Box:  
Figure 7. Multi-chassis communication
Development Methods

The proposed diversified router will be developed by Washington University’s Applied Research Laboratory in collaboration with a system development partner that will handle the printed circuit board schematic design, layout and fabrication. The ARL faculty, staff and students will develop the FPGA-based logic for the substrate, line card and RTM, plus the control software needed to configure the system for different interlay routers.

The configurable logic is being designed using industry-standard design tools, with which the ARL staff has extensive experience. Hardware design is carried out using VHDL, which allows designers to operate at relatively high levels of abstraction, while still making effective use of the underlying hardware capability. The combination of VHDL with industry-standard logic simulation and timing verification tools makes it possible to verify the operation of system components and correct the vast majority of design errors in simulation. This greatly reduces the problems that must be diagnosed and corrected in a laboratory setting. In addition, ARL staff members have extensive experience implementing network elements using VHDL and FPGAs. This means that we have a substantial library of components that can be adapted and re-used in new systems.

Development of software for the embedded Power PC processors in the Xilinx FPGAs will be done using the Embedded Development Kit (EDK) provided by Xilinx for that purpose. The EDK includes tools for assembling Power PC subsystems from pre-defined hardware building blocks, cross-compilers and debugging tools (including simulators) for verifying the correct operation of programs before transferring them to the hardware platform. The role of the embedded software will be to respond to control messages received from a remote control processor via a direct Ethernet connection. We have developed similar software for the experimental routers used in our Open Network Laboratory. We plan to adopt a similar approach in the development of both the embedded software and the remote control software, allowing us to re-use design elements, if not large amounts of actual code.

4. Impact of Infrastructure Projects

The proposed diversified router platform has the potential to be used in a wide variety of research settings, allowing it to be used by many different research groups. This section briefly describes three particular contexts where it can be used.


4.1. Research on Diversified Networking and Advanced Network Services

Text Box:  
Figure 8. Screenshot from Remote Lab Interface for Open Network Laboratory
The emerging concept of diversified networking is a central theme of the research activities now being pursued within Washington University’s Applied Research Lab. A flexible, high performance platform of the sort we are proposing is needed to enable this research to move beyond the conceptual stage. It will make it possible for us to develop and experimentally evaluate the mechanisms needed to implement diversified networking and to demonstrate how a multiplicity of networks, with widely varying characteristics can be implemented on a single, common infrastructure.

A diversified router platform can also be a valuable experimental research tool, even for groups with little or no interest in network diversification as a new paradigm for networking. A diversified router will allow researchers to create realistic implementations of new network architectures and services, evaluate their performance and create demonstrations to communicate the value of their ideas to others. A flexible, open platform built around FPGAs and network processors allows exploration of architectural innovations that cannot be evaluated effectively using commercial routers (due to their closed architectures). A platform that allows multiple architectures to share the same hardware infrastructure will allow different approaches to a given problem to be evaluated concurrently.

4.2. Possible Role in a National Network Research Testbed

At a recent NSF-sponsored workshop, a representative group of leading networking researchers raised the growing concern that the increasing ossification of the Internet is making it difficult for new network architectures and architectural innovations to have an impact on the core protocols that define the Internet today [WNSF]. The resistance of the Internet to change makes it difficult to address its many limitations (particularly with respect to security, reliability and scalability) and prevents the introduction of new services needed to support advanced applications. Although the report of the workshop is still under preparation, it is expected that the report will include recommendations to NSF for the creation of a wide-area network research testbed including a high bandwidth backbone using fiber optic transmission facilities available through the National Lambda Rail project [NLR]. Such a testbed would be designed to simultaneously support multiple experimental networks operating in parallel on the same underlying infrastructure, making the proposed diversified router platform proposed here, a natural candidate for implementation of high performance testbed nodes.

4.3. The Open Network Laboratory

Washington University is close to completion of a project to create an Open Network Laboratory (ONL) for use by the networking research community, for evaluation of advanced networking concepts in a realistic experimental environment. The laboratory is built around a set of open-source, gigabit routers that were developed at Washington University [CH97, CH02, DE01, LO00, TA03], and which are being made available to remote users through a Remote Laboratory Interface (RLI). The RLI allows users to remotely configure the routers (modify routing tables, install packet filters at router input and output ports, modify parameters of the queueing subsystems) and extend, modify or replace the software running in the routers’ embedded processors. The RLI also allows users to run applications on the PCs that serve as end-system in the experimental networks and monitor those running applications using the built-in data gathering mechanisms that the routers provide. Support for data visualization and real-time remote display is provided, allowing users to develop the insights needed to understand the dynamic behavior of the network when carrying out experiments. An example screenshot illustrating various elements of the RLI is shown in Figure 8. The RLI automates most of the mundane tasks associated with setting up and running an experiment, making it much easier for research users to get results, and minimizing the learning curve normally required for using complex experimental equipment. We plan to integrate the proposed diversified router platform into the ONL infrastructure, making its capabilities available for use by other researchers throughout the country. This new platform will provide a much richer set of resources and capabilities than the routers now available in the ONL, enabling them to be used to support a wider range of network research activities.

4.4. Broader Impacts

The research enabled by the development of a diversified research platform can have a broad impact that extends well beyond the networking research community. The diversified networking concept has the potential to transform the Internet, enabling new network technologies to be introduced at any time, stimulating competition among competing alternatives and preventing the recurring problem of network ossification that now plagues the Internet. The creation of such a platform can provide a compelling demonstration of the feasibility of these ideas and the potential of network diversification to change the world.

Washington University has a long track record for integrating its research into its curriculum and its training of graduate students. Graduate students will play a key role in the development of the diversified router and in the associated research projects that will surround its development. Previously developed experimental systems are already featured in networking courses open to both graduate students and undergraduates. The Open Network Lab is dramatically improving the accessibility of these systems to larger numbers of students and we expect it to have a profound impact on students at all levels. Students who work with such experimental systems both learn about the underlying principles on which they are built and have the opportunity to engage in hands-on experimentation. Our graduates have carried these experiences with them as they have gone on to careers in academia, research labs and industry, extending the impact of these activities well beyond Washington University. We plan to similarly integrate the diversified router platform and the results of our related research into the curriculum.

Washington University has been very successful at providing a supportive environment for minority students across the institution. As one indicator of this success, the Journal of Blacks in Higher Education [JB02] has ranked Washington University fourth in the nation in a survey ranking universities according to their “relative success in attracting, enrolling, and graduating African-American students and in bringing black professors to their campuses.” The report states:

Washington University was rated the best in two of the 13 categories in our survey. Some 84 percent of all entering black students at Washington University go on to graduate. This is a high graduation rate and only two percentage points below the rate for entering white students. The racial graduation gap was smaller at Washington University than at any other high-ranking university. In addition, blacks make up a very high 3.1 percent of the tenured faculty at the university. This is up from 1.2 percent only five years ago. This progress was unmatched by any other university in our survey.

The university also has established a program of graduate fellowships for minority students. Two of these Chancellors’ Fellows are currently pursuing their doctoral degrees in the Department of Computer Science and Engineering. While the university still has much to do to improve the representation of women and minority students in engineering and computer science, it has made considerable progress and is committed to continuing that progress in the future. The research programs that will be enabled by the development of the diversified router platform can contribute to that progress by providing a vehicle to support the educational aspirations of talented women and minority students pursuing careers in networking and communications.

5. Management Plan

Washington University’s Applied Research Lab has full-time technical staff with the skills and expertise to develop the hardware and software components for the proposed diversified router. ARL has a long track record of accomplishment in the design and development of such advanced experimental prototypes [CH02, CH97, DE01, LO00, TA03] and has produced systems in quantity for distribution to other research groups [KI98]. We plan to subcontract the design and fabrication of the individual printed circuit boards of the diversified router to a system development company, so that our staff can focus on the design of the configurable logic for the substrate, the embedded control software required to configure the processing engines for use in different interlay routers, plus additional configurable logic and software building blocks needed to enable researchers to more easily use the processing engines.

We have had preliminary discussions with several system development companies, including GDA Technologies [GDAT] and Sanmina-SCI [SAN] and have obtained budgetary estimates for the design services needed to implement the various circuit boards. The proposal budget is based on the estimates obtained through these discussions. Our selected partner will develop board schematics and layouts, under the direction of the Washington University staff. They will also contract with a PC board fabricator to produce and assemble the circuit boards. We plan to have the partner make the boards available for sale to any interested third party. Through this mechanism, other academic or industrial research groups will be able to obtain copies of the hardware for use in their own research labs. Washington University is committed to providing open-source versions of all configurable logic and software needed to use the systems effectively and will cultivate the development of an open-source community to bring greater value to the overall effort.

5.1. Project Management Team

The project will be managed by Jonathan Turner, Sever Professor of Engineering in the Department of Computer Science and Engineering at Washington University and the Director of the Applied Research Lab (ARL). He will be responsible for the overall system design and coordinating the development of the various components of the system. Dr. Turner has been involved in many aspects of networking research for over 20 years, with a focus on the design and analysis of switching systems and routers [CH97, CH02, KA04, KU03, PA03, PA04, SP03, TA02, TU94, WA97]. He has many widely cited publications, more than 25 patents and was a co-founder of a successful startup company that developed switching components for high performance routers and switches. His current focus is on creating technologies for diversified networking. ARL has a long track record of building complete experimental network systems that demonstrate key research ideas and provide a platform for experimentation and application demonstrations. The lab currently has five full-time staff members who work with the affiliated faculty and more than 20 graduate students on a variety of projects. These resources allow the lab to pursue the more challenging systems projects that are essential for advancing the state-of-the-art in a complex systems area like networking.

Text Box:  
Figure 9. Development Schedule
Co-PIs, John Lockwood and Patrick Crowley will also play important roles in managing the system development. Dr. Lockwood has extensive expertise in the development of systems using advanced FPGAs and the application of configurable logic to a variety of problems in networking and network security [DH03, JO04, LO00, LO01, LO03a, L)03b, MA04, SC03, SC04a, SC04b]. He will play a central role in the development of the Carrier Card, PE/CL card and the configurable logic that must be developed for use with these components. Dr. Crowley has emerged as a leading academic researcher in the area of network processor architectures and performance. His work was among the earliest published research concentrating on NPs and demonstrated how thread-parallel architectures could greatly out-perform architectures relying on instruction-level parallelism [CR99. CR00a, CR00b, CR02, CR03, KU04]. He has conducted tutorials on NPs at several major research conferences and has extensive experience with the Intel IXP family of NPs. He will play the leading role in the development of the PE/NP card and the development of the associated software needed to use it for implementing novel interlay routers.

5.2. Task Breakdown and Schedule

The project has been broken down into a series of smaller tasks which are detailed in the following paragraphs. Figure 9 identifies the different phases for each task and the planned schedule for carrying them out.

·   Task 1 – System Specification. This task is primarily concerned with developing a detailed system specification and negotiating a contract with the system development company that will design and produce the circuit boards for the system.

·   Tasks 2 thru 6 – PC Board Development. Each of these tasks is concerned with the development of one of the printed circuit boards. For each board, there will be periods devoted to schematic capture, board layout, prototype fabrication, verification and if necessary a re-spin to correct any serious problems that might arise in the design. These tasks are offset from one another in time to maximize opportunities for design re-use and to maximize continuity of the personnel.

·   Tasks 7 thru 9 – Configurable Logic Development. Three of the boards that implement the substrate of the diversified router have FPGAs that must be configured to allow them to implement their assigned function. The FPGA on the substrate board will implement the switching function, using configuration tables that define communication paths among the elements that implement each interlay router. The FPGA on the Line Card will provide separate queues for different interlay links and will route arriving packets to the appropriate processing engines, based on its internal configuration tables. The FPGA on the ILC will implement a Ring Interface for sending and receiving packets on the inter-chassis rings. For each configurable logic chip, there will be periods devoted to high level design, VHDL coding, functional verification, timing closure and system verification.

·   Task 10 – Control Software Development. We plan to develop control software to allow the diversified router to be remotely configured through control messages sent from a remote computer system. These messages will be processed by the embedded Power PC processors in the FPGAs on the various boards. A remote control processor will send messages to one Power PC over an Ethernet connection provided for control, and it will forward messages to others within the same chassis across the backplane. This task requires the development of software on both the remote CP and the embedded processors.

·   Task 11 – IPv4 Interlay Routers.  As an initial demonstration, we plan to implement two IPv4 interlay routers, one using the PE/NP and one using the PE/CL. In addition to serving as a basic demonstration of system operation, they will provide a base configuration that can be extended and adapted by other researchers. While this task would be a substantial project by itself, if one had to implement it “from scratch”, we have already implemented IPv4 routing functionality in configurable logic in previous projects and we intend to port that design to the new platform. Similarly, there is substantial existing software for IPv4 packet forwarding for the IXP 2850 network processor, making the size of the development task for the NP version manageable.

5.3. Risks and Strategies for Risk Mitigation

The entire project has been designed with considerable attention to mitigating risks and maximizing the chances of success. The leveraging of standard ATCA components is one important part of this strategy. A second is the use of a system development partner to carry out the printed circuit board design and layout. These tasks require specialized CAD tools, detailed schematic parts libraries and knowledge of PC board fabrication constraints. While ARL has done this kind of work in the past, we have found that it is difficult for a lab that does a limited number of board designs to stay current with the latest tools and to the meet the design challenges for systems that require a large number of high speed (up to 3.125 GHz) differential signals. Another fundamental aspect of the project that tremendously reduces risk is the systematic use of FPGAs to implement major system functionality. FPGAs allow design errors in system logic to be easily corrected at any time. They even make it possible to work around many types of board-level errors that would make a system built around fixed-function chips inoperable. The flexibility of FPGAs also allows the basic hardware platform to be adapted to a wide variety of different research objectives, ensuring that the resulting platform will continue to be useful, even after the original research agenda for which it was built has been completed.

The division of components into boards and mezzanine cards was also done with an eye to minimizing risk. Each individual part of the system implements a limited amount of system functionality. The interchangeability of the mezzanine cards allows corrections to be made to individual cards without affecting the Carrier Card. The implementation of inter-chassis communication using RTMs distributes the switching function and separates the inter-chassis communication from other system functions, limiting the potential for flaws in one part to have a serious impact on another.

At the same time, no project of this magnitude is without risk. The most serious concern for this project is that it is quite ambitious in its scope. While we believe that the budget and development plan are realistic, we are well aware that it’s easy to misjudge the amount of effort that such a large project may require. Should our estimates of budget and schedule prove too optimistic, we can still achieve most of our objectives by eliminating certain, less essential tasks. In particular, Tasks 6 and 9, which are concerned with inter-chassis communication, could be deferred without affecting most uses of the proposed systems (since we expect single chassis systems to be sufficient for most research applications). Similarly, Task 11 could be deferred, without sacrificing any essential objectives of the project.

6. Results from Prior NSF-Funded Infrastructure Projects

Gigabit Network Technology Distribution Program (ANR 6037401, 7/96-6/01, $3,087,515). This was an infrastructure program that provided systems researchers with open high performance networking equipment for use in experimental systems research. The project was motivated by the observation that in recent years, research in networking, distributed systems and high performance computing has been hampered by the research community’s limited access to high performance networking equipment, and more importantly, the detailed technical information needed to use it effectively in experimental systems research programs. Systems researchers require detailed technical information regarding their research infrastructure and the ability to modify that infrastructure by replacing software, adding hardware extensions or modifying existing hardware. These needs are incompatible with the objectives of commercial vendors who are unwilling to make their products’ proprietary details public and who are ill prepared to support activities by research users. Even when companies occasionally make technical information available to researchers, limitations on publication can make it difficult for others in the research community to verify results and extend them. The same constraints on the use of commercial equipment drastically limits its utility in education, since students in systems-oriented courses have limited opportunity to explore the internals of a non-trivial system and make experimental modifications. This program addressed this need by providing researchers at 30 U.S universities with Gigabit Network Kits comprising a WUGS-20 ATM switch, a set of gigabit network interface cards, supporting software, training and periodic workshops, where participants shared experiences and ideas [KI98]. The program was extended to support the provision of Smart Port Cards and Field Programmable Port Extenders to participants, allowing them to use their systems as multiservice switch/routers, and experiment with active networking and related concepts. The lessons learned from this program have shaped the more recent initiative to develop an Open Network Laboratory. We expect the ONL to drastically reduce the learning curve for research users of experimental networking equipment, making it accessible to a much larger research community.

Prototype and Distribution of Reprogrammable Queuing Modules for Scalable Input Buffered Switches (ANI 0096052, 7/99-7/02, $773,000). Through ANI-0096052, an open platform called the Field Programmable Port Extender (FPX) was designed and implemented [LO00, LO01]. Copies of the FPX board have been produced for distribution to participating institutions in the Gigabit Network Technology Distribution Program. The FPX enables the rapid prototype and deployment of packet processing modules in reprogrammable hardware. It includes a large FPGA with four independent memory banks (two SRAM, and two SDRAM), which can be remotely programmed over the network, and is intended to implement packet processing functions in hardware. Generalized packet processing functions have been implemented as dynamic hardware plugin modules. One such module implements IP address lookup at rates of more than 10 million packets per second, and also supports fast updates without interfering with packet processing. Software has also been developed to modify the hardware data structures from a remote location in response to user commands. A suite of tools called NCHARGE (Networked Configurable Hardware Administrator for Reconfiguration and Governing via End-systems) was developed to simplify the co-design of hardware and software modules on the FPX.  Using NCHARGE, an Application Programming Interface (API) has been developed to standardize communication among modules. A set of web-based tools has also been implemented to simplify the control and configuration of the FPX platform.