| CS/COE 535M |
Acceleration of Algorithms in Reconfigurable
Hardware |
Lockwood, Fall 2001 |
Machine Problem 1
Text Processing in Reprogrammable Hardware
| Assigned |
Wednesday, September 5, 2001 |
| Due Date |
Part 1: Thursday, September 13, 2001, 4pm
Part 2: Thursday, September 20, 2001, 4pm |
| Purpose: |
Introduction to the FPX Environment and
CAD SynthesisTools
|
| Points | 50 |
Introduction
The FPX provides simple and fast mechanisms to process cells or packets
directly in hardware. By performing all computations in FPGA hardware,
cells and packets can be processing at the full line speed of the card
[currently 2.4 Gbits/sec].
For this assignment, you will code a sample application, called
'Hello World'. This circuit will be implemented as module on the FPX.
This application uses the FPGA hardware to search for a string on a
particular flow and selectively replace contents of the payload. Your
resulting circuit should opperate at 100 MHz or faster on a Xilinx
XCV1000E-FG680-7 FPGA. It should also occupy less than 1% of the
available gates on the device.
Background: Simplified RAD Entity
The FPX is divided into two major components, the Networking Interface
Device (NID) and the Reprogrammable Application Device (RAD).
You will implement all circuits as modules in the RAD.

Figure 1: Generic RAD Entity
The simplest configuration for a RAD module is
shown in Figure 1.
As with all RAD modules, the circuit operates at
the 100 MHz frequency of RAD_Clock.
RAD_Reset is asserted, active high, synchronously
with RAD_Clock in advance of data arrivals.
The RAD has two interfaces: one interface typically used for
data from the switch (egress), and the other typically used for
data from the line card (ingress). Modules can be mapped to either interface.
For this interface, the design is mapped to the switch (sw) side of the RAD.
Data arrives as cells on a 32-bit data bus, DataIn[32].
Data leaves the module on the DataOut[32] bus. In general, a module
can add, modify, delete, or delay cells.
The arrival of a new cell on the bus is indicated by
the StartOfCell (SOC) signal. This signal goes high to indicate
that the bus contains the first word of the cell.
In order to send data, a module
simply asserts SOC when it has a new cell ready to transmit.
The TransmitCellAvailable signal (TCA) is used for flow control.
A module can block the arrival of a new cell by asserting this signal
no less than 6 cycles before the end of the previous cell.
Modules must defer the transmission of cells if the outgoing
interface is congested, as indicated by downstream TCA.
The 'Hello world' application never creates cells or delays cells
by more than a few clock cycles. It therefore never creates nework
congestion. 'Hello world' can simply map
the outgoing TCA indicator to the incoming interface.
Description of Assignment
This assignment has two parts.
The purpose of the first part of this assignment is to help you
become familiar with the tools and the testbenches that you will use to
simulate your modules for the rest of the semester. The second part of
this assignment introduces you to the recommended design flow
for creating FPGA circuits that can be downloaded onto the FPX.
The RAD contains a circuit called RAD_LOOPBACK which
instantiates another circuit called RAD_LOOPBACK_CORE which
receives data from the NID, and passes it to the the egress module,
passes it to the ingress module, then
returns that cells to the NID.
The circuit that actually does looping of the data for both the
egress and the ingress module is called LOOPBACK_MODULE.
In this assignment,
you will replace one of the LOOPBACK_MODULE with a module
of your own that transforms the contents of cells to say
"Hello World".
Part 1 : Simulation and Synthesis of Pass Through Module
In this part of the assignment, you are given VHDL code for the
TWOSTATE_MODULE. The TWOSTATE_MODULE replaces one of the LOOPBACK_MODULEs
in the RAD_LOOPBACK_CORE. You are also given code for the testbench that you
will be using for the rest of the semester. Figure 2 shows a block diagram
of how the testbench is set up and the flow of data throught the testbench.

Figure 2: Testbench block diagram.
Functionally, the TWOSTATE_MODULE is very similar to the LOOPBACK_MODULE.
All it does is to pass data through unchanged. However, it was designed
differently in order to allow for changes later on to make it do other
things. Figure 3 shows the block diagram for the circuit.

Figure 3: Twostate_module block diagram.
As seen in Figure 3, the module has two main components: 1) a finite state
machine, and 2) a counter. The finite state machine has two states -- an
"init" state and a "dout" state. When it is in the "init" state it does not
do anything until it receives a start of cell signal, its 'data_sel' signal
selects the NUL output to go to 'nxt_data_out'. When it receives a start of
cell signal, it goes into the "dout" state. When it is in the "dout" state,
its 'data_sel' signal selects the data coming in to go to 'nxt_data_out'. The
counter is used to keep track of the number of words it has received. After
receiving 14 words the module knows that it has receive a full cell and goes
into "init" state again unless it sees another start of cell signal.
Figure 4 shows the bubble diagram for the finite state machine. Note that
the finite state machine stays in the "init" state until the first 'soc_in'.
During this time, the counter is disabled and the machine outputs no data.
When the 'soc_in' signal arrives, the machine transits to the "dout" state.
It also sets the initial value of the counter to zero, enables the counter
to count and outputs data coming in.

Figure 4: Two-state finite state machine bubble diagram.
Your task for this part of the assignment is to:
- Simulate the code in Modelsim, making sure that you are comfortable
enough with Modelsim and the commands in the simulation make file to be
able to simulate your own designs later on. You can refer to
modelsim tutorial for help on how to run Modelsim using the make
file you are given.
- Synthesize the code to get the netlist, using Synplicity Pro, again
making sure that you are comfortable enough with Synplicity Pro to be
able to synthesize your own designs. You can refer to
synplicity tutorial for help on how to run Synplify Pro.
- Generate the bit stream that can be used to configure the fpga by using
the build file to run the Xilinx Alliance tools, make sure you understand
the commands in the build file.
The tar files for this assignment consists
of three folders, vhdl, sim and syn. The vhdl folder has all the files that
are synthesizable. All the vhdl files you will need to make changes to are in
in this folder. The sim folder has all the testbench files that you will use
to simulate your circuit in modelsim. Some of these files are not
synthesizable. The syn folder is where your synthesis and place and route
results should go. Files included in the tar files are:
- vhdl folder:
- rad_loopback.vhd: this is the top level RAD entity, it
instantiates RAD_LOOPBACK_CORE. You should not need to make any
changes to this module.
- rad_loopback_core.vhd: the original RAD_LOOPBACK_CORE
instantiated two LOOPBACK_MODULES, one at the ingress and one at the
egress. This instantiates a LOOPBACK_MODULE at the ingress and a
TWOSTATE_MODULE egress.
- loopback_module.vhd: this module just loops data back to the
NID. You should not need to make any changes to this module.
- blink.vhd: this module is instantiated by RAD_LOOPBACK_CORE.
It blinks an LED on the FPX so that you know that your bit file has
been downloaded correctly. You should not need to make any
changes to them.
- twostate_module.vhd: this is the top level for to two state
module. This is where you will need to start making changes to make
"hello world" work.
- counter_4_bit.vhd: this is the VHDL file for the counter
that will count the number of words you have received for a cell.
- data_flop.vhd: this is the VHDL file for the output flop
for data.
- soc_flop.vhd: this is the VHDL file for the output flop for
soc.
- tca_flop.vhd: this is the VHDL file for the output flop for
tca.
- mux2.vhd: this is the VHDL file for the multiplexor.
- twostatefsm.vhd: this is the VHDL file for the two state
finite state machine that you will eventually have to modify to
make the "helloworld" finite state machine.
- sim folder:
- Makefile: this is the makefile for compiling and simulating the
design.
- testbench.vhd: this is the top level testbench for the rad
loopback module. It instantiates RAD_LOOPBACK, a clock, fake_NID_in,
and fake_NID_out. You should not need to make any changes to this
module.
- clock.vhd: this is VHDL code that simulates a clock with a
10ns period. You should not need to make any changes to this module.
- fake_NID_out.vhd: this module reads in data (test cells) from
TESTCELL.DAT file and sends it the RAD_LOOPBACK (refer to figure 2.)
- fake_NID_in.vhd: this module receives data (modified cells)
from RAD_LOOPBACK and writes it to CELLSOUT.DAT.
- INPUT_CELLS.DAT: this is the data file that TESTBENCH reads in and
sends to fake_NID_out.
- modelsim.ini: this file sets the paths for the different xilinx
libraries that are needed for compiling and simulating VHDL code.
- testbench.do: this is the do file for running the simulation
after getting into modelsim.
- syn folder: save your synplicity .prj (project) file in here.
- rad-xcve100 folder: save your Syplify Pro synthesis results in here.
- build: this is a sript file with all the commands for the
Xilinx Alliance tools.
- rad_loopback.ucf: this is the file with the signal to pin
assignments.
- bitgen.ut: this file passes the FPGA configuration options
for generating the .bit file.
Part 2 : Hello World
In this part of the assignment, you should design a module to replace the
TWOSTATE_MODULE in the RAD_LOOPBACK_CORE. Your module should check the VCI
field of a new incoming cell. If its VCI equals '5' then your module should
check the first two bytes of the cell payload. If the cell payload contains
the letters "HELLO" then your module should write "WORLD." After that your
module should just proceed to output the remaining bytes in the cell as they
come in. Figure 5 shows graphically how the module should work if all the
conditions are met.

Figure 5: Cell processing for matching cell.
If at anytime the incoming data does not meet the specified conditions,
your module should just output the bytes in the incoming cell as they come
in. Here are some examples where the the specified conditions are not met:

Figure 6: Cell processing for mismatched VCI.
First, cells should only be processed if they arrive on the
correct VCI. In this example, we have chosen to process
cells on VCI=5. If the VCI doesn't match, the cell should
should pass through the circuit without modification,
as shown in Figure 6.

Figure 7: Cell processing for mismatched payload
Second, for those cells that do arrive on the correct VCI,
the string must match over all words in the payload.
For the string shown in Figure 7,
a mismatch is found in the the first byte of the first word.
Since the "MELLO" doesn't match "HELLO", the contents of the
cell should be left unchanged.

Figure 8: Cell processing for mismatched payload (2)
Performing a string match on the FPX is slightly complicated
by the fact that the payload arrives as a stream of words;
not all at once.
Since an FPX module receives only one word per clock cycle,
the circuit must know the status of previous comparisons to
ensure that all current and previous words matched before it
writes the word "WORLD." in the current and future clock cycles.
Implementation Hints for Part 2
There are many different ways to implement something in hardware to get
it to the same thing. In general when implementing a design in hardware the
first thing to do is to draw a block diagram for the design labeling each
component and deciding what each component should do. Next if there is a
finite state machine in the design, you should draw a bubble diagram for
the finite state machine and draw timing diagrams for the different cases
that the finite state machine will encounter. Here is one way to implement
the hellow world design, by making changes to the TWOSTATE_MODULE:

Figure 9: Helloworld_module block diagram.
The block diagram of the circuit is seen in Figure 9. As seen in the figure,
the block diagram for the hello world and the two state circuit are almost
the same. The major difference between them is their finite state machines.
If you choose to use this design, the finite state machine is left as an
exercise for you to code in VHDL. The bubble diagram for the finite state
machine is shown in Figure 10.

Figure 10: Hello world finite state machine bubble diagram.
As you can see the hello world finite state machine is slightly more
complicated than the two state finite state machine (four additional states).
The hello world finite state machine begins in the initial state "init",
from there is can either go into the "pad" state or the "dout" state
depending on the inputs it sees. When it is in the "pad" state, if goes on
to the "hell_check" state where it checks the incoming first byte of the
payload to see if it sees the letters 'HELL'. Depending on what it sees,
it branches to either check for 'O', or just passes data through...

Figure 11: Timing Diagram for matching cell.
A timing diagram of the circuit for the case where all the conditions are
met is shown in Figure 11. You will have to draw the timing diagrams
for the other three cases where one of the conditions is not met.
Things you need to turn in
Here is a check list of the things you need to turn in:
- Part 1: Simulation and Synthesis of Pass Through Module
- Printout of Modelsim simulation of the TWOSTATE_MODULE. Only print
"rad_clk_i" down through "tcaff_se_nid_i" for the first 400ns. If you
need help in printing the required signals you can refer to
selecting waves for printing.
- Bit file for the TWOSTATE_MODULE generated from Xilinx Alliance tools.
- Part 2: Cell Processing Module
- Part 2A: Modify the "Two State" code to have the module do "Hello World".
- Timing diagram for the case when VCI is not equal 5
- Timing diagram for the case when VCI is equal to 5 but the first
byte does not contain the letters 'HELL'
- Timing diagram for the case when VCI is equal to 5 and the first
byte contains the letters 'HELL' but the second byte does not
contain the letters 'O'
- VHDL code for the finite state machine that compiles and works
with the given code for the other blocks in the block diagram.
- Run './build >mylog'. Cut, paste, and print the sections of this
log that identify the Design Summary and the first
10 entries from the 'PAR statistics'
- Part 2B: Modify the "Hello World" code to have the module write
'Good-bye' in the last two words of the cell.
- Bubble diagram for the modified finite state machine.
- Timing diagram for the modified finite state machine.
- VHDL code that compiles and works.
- Run './build >mylog'. Cut, paste, and print the sections of this
log that identify the Design Summary and the first
10 entries from the 'PAR statistics'
- Please print the grade sheet and staple it on top
of your machine problem.