| CSE 566 | Reconfigurable System On Chip Design |
John W. Lockwood Fall 2004 |
Strings can be predefined in the VHDL source code that implements the circuit. They can also be modified at run-time by sending special UDP/IP packets to the circuit's Control Processor.
The circuit scans for strings that are up to 32 characters (64 hex nibbles) in length. The circuit can be programmed to find any of four different strings. A diagram of how the string scanning will be performed with live traffic is shown in the figure below. Using the circuit, you will be able to GET pages from the web server on the Agent-Jones website, and process the bytes in hardware as they stream from the server to your PC. One FPX will extract data from a TCP/IP stream, while a scanning application (ScanApp) implmented in another FPX module will count the content that goes through the network.
Within the scan circuit, hardware implements a 9-stage pipeline that holds a total of 36 bytes. Pipelined stages are implemented with positive-edge-triggered Delay Flip/Flops. These contents of Flip/Flops maintain the state of the most recent bytes of the flow. Data flows through the nine pipeline stages in four-byte strings: ck1 through ck9, organized like:
The first data that arrives in the data stream moves first into ck1, next to ck2, and eventually reaches ck9. The string itself is a 36-byte string spread over a 9-stage pipeline.
At each clock cycle, four parallel comparisons are made, using the xor-nand method described above. A 32-byte comparison is made starting with the first byte of ck9; a 32-byte comparison is made starting with the second byte of ck9; and so on and so on. If for any clock cycle 32 consecutive bytes of data match (i.e. all eight processes looking at the check strings with the same offset all report success), the circuit reports a match.
What if the amount of data in our stream is not a multiple of four? That's the purpose of valid_byte_count. This is a copy of the extra information stored in the data fifo regarding the number of bytes that are valid in any given 32-bit value being read out of the data fifo (this signal is generated by the TCP Processor). If for the last four bytes of a packet not all four bytes are valid, then the matching processes will not perform all four comparisons (at all four offsets); in this way results are as expected.
When a flow's final 32 bytes are in the check string, these bytes are stored in SDRAM. The flow identifier, being unique for each flow, is used to drive the address pins. When a flow arrives from the TCP Processor and the new flow signal is not asserted, the scan module queries the state store manager. The state store manager will then query memory with the flow identifier, and return the last 32 bytes to the scan module.
At this point, the first four bytes from the new packet are read in (they'd be in ck1), and you have your 36-byte check string, allowing the scanning to continue as if the interruption had never taken place.
To watch the state store in action, view the signals
Keep in mind that preloading occurs. The state machines are designed so that when you switch to a new flow (as seen in the upd_flow_id signal), on the same clock cycle, your data is ready in ss_in(1/2/3/4) for reading. We will be taking advantage of this.
The "Other Storage" module is simply a second state store that works exactly as the State Store described above. The second storage module is wired up to the second SDRAM module.
In the code that is given, the Other Storage stores a word-swapped version of the State Store. This is arbitrary. It was just wired up this way so that you could see the module in action during system simulation. So if the state store has:
A second state store module -- "other store" -- has been implemented. For each flow ID, there are four 64-bit values in memory we have use of (see oth_out1/oth_in1 through oth_out4/oth_in4).
Search strings can be changed dynamically by using control packets. These packets are UDP packets that are received by the Control Processor. If you want to watch the behavior of this, view the following signals
First, implement a circuit that tracks the statistics of string matching for each flow. In particular, track how many times each of the four programmable strings occurs in the content and track the time when a flow starts. To do this, perform the following steps:
Add four 16-bit "string_count" counters to the scan module. Zero out these counters when a new flow begins (see the "upd_newflow" signal).
Add a 32-bit "timestamp" counter to the Scan module. This should increment by 1 on every clock cycle. Whenever a new flow arrives, capture the time it arrives by making a static copy of the counter value at the time of the "upd_newflow" signal.
Store these values in the other store in the following format: the first 64-bit value should contain the four 16-bit counters. The first 32 bits of the second 64-bit value should contain the start timestamp. The rest of the bits of the values should be blank (zero). Remember that the values must be in the oth_out signals when these signals are copied onto OTH_UPD_DATA to send to storage. One of the times this will happen is shortly after a newflow signal is triggered; hence you should keep these oth_out signals upated at all times.
Ensure that when the values for a (non-new) flow come back from the other store (on oth_in1 through oth_in4), they are copied onto the counters. These should then be saved at the end of each packet if an update is required (when will an update be required? Will you have to change the determination of this or not? Will an update to the counters ever be necessary if an update to the strings are not?); in this way you should keep running totals of matched strings for all flows.
When an end-of-flow occurs, copy the counters for that flow, the flow's start timestamp, the current timestamp, and the flow id (from upd_flow_id) to the control processor. Do this by extending a new bus out to the control processor that maps to a new FIFO that you create (using VHDL, or Coregen, or any other method). Make the FIFO large enough that it will not become full (room for 10 or 20 of these values should be more than sufficient).
Hint #1: Remember that you can perform simple math functions (such as addition) on standard logic vectors. Overflow on all new counters should be handled by rolling back to zero.
Hint #2: Keep track of your timing! Most of the timing involved is actually not very difficult, if you pay attention, but it is possible to get tripped up.
Hint #3: Make sure to add your new FIFO's .vhd file to the Makefile in the sim directory, or else ModelSim will see the FIFO's signals as "Undefined."
Hint #4: Notice that the upd_newflow signal may be high for longer than one clock cycle. You need to ensure that the start time captured for a flow is for the first clock cycle of that flow.
Hint #5: Only assign a values to a signal in one process.
Second, you are going to modify the Control Processor to change when and what is sent out in the UDP packets. The resulting control packets that will be written to SW_CELLSOUT.DAT should look like:
Implement the FIFO and bus mentioned in #5 of the first part above.
Comment out the current output process and use it as a template to output UDP packets with the following values: Flow ID, start timestamp, end timestamp, and the four counters for that flow (these should be extended to 32-bit values when being output). Remember to have the correct length in the UDP header.
Trigger your packet output based on whether or not the FIFO is empty. If you are not currently outputting a packet and your FIFO is not empty, you should output a packet with the values stored in the FIFO until it is empty.