CS/COE 535M Acceleration of Algorithms in Reconfigurable Hardware Lockwood, Fall 2001

Machine Problem 1

Text Processing in Reprogrammable Hardware

Assigned Wednesday, September 5, 2001
Due Date Part 1: Thursday, September 13, 2001, 4pm
Part 2: Thursday, September 20, 2001, 4pm
Purpose: Introduction to the FPX Environment and
CAD SynthesisTools
Points50

Introduction

The FPX provides simple and fast mechanisms to process cells or packets directly in hardware. By performing all computations in FPGA hardware, cells and packets can be processing at the full line speed of the card [currently 2.4 Gbits/sec].

For this assignment, you will code a sample application, called 'Hello World'. This circuit will be implemented as module on the FPX. This application uses the FPGA hardware to search for a string on a particular flow and selectively replace contents of the payload. Your resulting circuit should opperate at 100 MHz or faster on a Xilinx XCV1000E-FG680-7 FPGA. It should also occupy less than 1% of the available gates on the device.

Background: Simplified RAD Entity

The FPX is divided into two major components, the Networking Interface Device (NID) and the Reprogrammable Application Device (RAD). You will implement all circuits as modules in the RAD.

Figure 1: Generic RAD Entity

The simplest configuration for a RAD module is shown in Figure 1. As with all RAD modules, the circuit operates at the 100 MHz frequency of RAD_Clock. RAD_Reset is asserted, active high, synchronously with RAD_Clock in advance of data arrivals. The RAD has two interfaces: one interface typically used for data from the switch (egress), and the other typically used for data from the line card (ingress). Modules can be mapped to either interface. For this interface, the design is mapped to the switch (sw) side of the RAD.

Data arrives as cells on a 32-bit data bus, DataIn[32]. Data leaves the module on the DataOut[32] bus. In general, a module can add, modify, delete, or delay cells.

The arrival of a new cell on the bus is indicated by the StartOfCell (SOC) signal. This signal goes high to indicate that the bus contains the first word of the cell. In order to send data, a module simply asserts SOC when it has a new cell ready to transmit.

The TransmitCellAvailable signal (TCA) is used for flow control. A module can block the arrival of a new cell by asserting this signal no less than 6 cycles before the end of the previous cell. Modules must defer the transmission of cells if the outgoing interface is congested, as indicated by downstream TCA.

The 'Hello world' application never creates cells or delays cells by more than a few clock cycles. It therefore never creates nework congestion. 'Hello world' can simply map the outgoing TCA indicator to the incoming interface.

Description of Assignment

This assignment has two parts. The purpose of the first part of this assignment is to help you become familiar with the tools and the testbenches that you will use to simulate your modules for the rest of the semester. The second part of this assignment introduces you to the recommended design flow for creating FPGA circuits that can be downloaded onto the FPX.

The RAD contains a circuit called RAD_LOOPBACK which instantiates another circuit called RAD_LOOPBACK_CORE which receives data from the NID, and passes it to the the egress module, passes it to the ingress module, then returns that cells to the NID. The circuit that actually does looping of the data for both the egress and the ingress module is called LOOPBACK_MODULE. In this assignment, you will replace one of the LOOPBACK_MODULE with a module of your own that transforms the contents of cells to say "Hello World".

Part 1 : Simulation and Synthesis of Pass Through Module

In this part of the assignment, you are given VHDL code for the TWOSTATE_MODULE. The TWOSTATE_MODULE replaces one of the LOOPBACK_MODULEs in the RAD_LOOPBACK_CORE. You are also given code for the testbench that you will be using for the rest of the semester. Figure 2 shows a block diagram of how the testbench is set up and the flow of data throught the testbench.

Figure 2: Testbench block diagram.

Functionally, the TWOSTATE_MODULE is very similar to the LOOPBACK_MODULE. All it does is to pass data through unchanged. However, it was designed differently in order to allow for changes later on to make it do other things. Figure 3 shows the block diagram for the circuit.


Figure 3: Twostate_module block diagram.

As seen in Figure 3, the module has two main components: 1) a finite state machine, and 2) a counter. The finite state machine has two states -- an "init" state and a "dout" state. When it is in the "init" state it does not do anything until it receives a start of cell signal, its 'data_sel' signal selects the NUL output to go to 'nxt_data_out'. When it receives a start of cell signal, it goes into the "dout" state. When it is in the "dout" state, its 'data_sel' signal selects the data coming in to go to 'nxt_data_out'. The counter is used to keep track of the number of words it has received. After receiving 14 words the module knows that it has receive a full cell and goes into "init" state again unless it sees another start of cell signal.

Figure 4 shows the bubble diagram for the finite state machine. Note that the finite state machine stays in the "init" state until the first 'soc_in'. During this time, the counter is disabled and the machine outputs no data. When the 'soc_in' signal arrives, the machine transits to the "dout" state. It also sets the initial value of the counter to zero, enables the counter to count and outputs data coming in.


Figure 4: Two-state finite state machine bubble diagram.

Your task for this part of the assignment is to:

The tar files for this assignment consists of three folders, vhdl, sim and syn. The vhdl folder has all the files that are synthesizable. All the vhdl files you will need to make changes to are in in this folder. The sim folder has all the testbench files that you will use to simulate your circuit in modelsim. Some of these files are not synthesizable. The syn folder is where your synthesis and place and route results should go. Files included in the tar files are:

Part 2 : Hello World

In this part of the assignment, you should design a module to replace the TWOSTATE_MODULE in the RAD_LOOPBACK_CORE. Your module should check the VCI field of a new incoming cell. If its VCI equals '5' then your module should check the first two bytes of the cell payload. If the cell payload contains the letters "HELLO" then your module should write "WORLD." After that your module should just proceed to output the remaining bytes in the cell as they come in. Figure 5 shows graphically how the module should work if all the conditions are met.

Figure 5: Cell processing for matching cell.

If at anytime the incoming data does not meet the specified conditions, your module should just output the bytes in the incoming cell as they come in. Here are some examples where the the specified conditions are not met:


Figure 6: Cell processing for mismatched VCI.

First, cells should only be processed if they arrive on the correct VCI. In this example, we have chosen to process cells on VCI=5. If the VCI doesn't match, the cell should should pass through the circuit without modification, as shown in Figure 6.


Figure 7: Cell processing for mismatched payload

Second, for those cells that do arrive on the correct VCI, the string must match over all words in the payload. For the string shown in Figure 7, a mismatch is found in the the first byte of the first word. Since the "MELLO" doesn't match "HELLO", the contents of the cell should be left unchanged.


Figure 8: Cell processing for mismatched payload (2)

Performing a string match on the FPX is slightly complicated by the fact that the payload arrives as a stream of words; not all at once. Since an FPX module receives only one word per clock cycle, the circuit must know the status of previous comparisons to ensure that all current and previous words matched before it writes the word "WORLD." in the current and future clock cycles.

Implementation Hints for Part 2

There are many different ways to implement something in hardware to get it to the same thing. In general when implementing a design in hardware the first thing to do is to draw a block diagram for the design labeling each component and deciding what each component should do. Next if there is a finite state machine in the design, you should draw a bubble diagram for the finite state machine and draw timing diagrams for the different cases that the finite state machine will encounter. Here is one way to implement the hellow world design, by making changes to the TWOSTATE_MODULE:


Figure 9: Helloworld_module block diagram.
The block diagram of the circuit is seen in Figure 9. As seen in the figure, the block diagram for the hello world and the two state circuit are almost the same. The major difference between them is their finite state machines. If you choose to use this design, the finite state machine is left as an exercise for you to code in VHDL. The bubble diagram for the finite state machine is shown in Figure 10.

Figure 10: Hello world finite state machine bubble diagram.
As you can see the hello world finite state machine is slightly more complicated than the two state finite state machine (four additional states). The hello world finite state machine begins in the initial state "init", from there is can either go into the "pad" state or the "dout" state depending on the inputs it sees. When it is in the "pad" state, if goes on to the "hell_check" state where it checks the incoming first byte of the payload to see if it sees the letters 'HELL'. Depending on what it sees, it branches to either check for 'O', or just passes data through...

Figure 11: Timing Diagram for matching cell.

A timing diagram of the circuit for the case where all the conditions are met is shown in Figure 11. You will have to draw the timing diagrams for the other three cases where one of the conditions is not met.

Things you need to turn in

Here is a check list of the things you need to turn in: