CSE 770 Paper Review

Reviewer: Michela Becchi
Date: 12-8-2005

How would you rate this paper, relative to others we have read? top 25%, but not top 10%

How would you rate your knowledge of the topic of this paper? familiar, but not expert

What problem or issue does the paper address? Why is it important?

The paper presents a string matching architecture for intrusion detection systems meant to have high throughput/area ratio, to maintain tight bounds on worse case performance and to allow concurrent update of the rule database during normal operation. The basic idea at the core of the proposal is a novel way to implement the Aho-Corasick algorithm.
Not only IDS are nowadays a very topical subject, but also string matching has plenty of applications in different areas. Therefore, the acceleration of algorithms for string matching is a topic of interest.

What are the main contributions of the paper and why are they important?

The main contribution of the paper is the idea of implementing the Aho-Corasick algorithm instead than through a single state machine where each transition is associated to an 8-bit character, through a combination of eight "binary" state machines. Each of these state machines represents the evolution of a single bit in the binary encoding of the characters of the represented rules. In a final step, bit vectors are used in order to intersect the results of the different FSMs. This architecture allows reducing the fanout of each step from a maximum of 256 to 2, dramatically diminishing the memory requirements.
A minor contribution consists in the idea of partitioning the rules among tiles in order to further improve performance.

How significant are these contributions relative to previous work?

I think that the idea of having eight FSM with reduced fanout in place of a FSM with potentially large fanout is novel.

Give detailed comments justifying your view of the paper.

The paper presents a novel design for implementing the Aho-Corasick algorithm for string matching in the context of intrusion detection systems. The authors aim at providing good throughput with minimal area occupancy, tight bound to worse case performance and rules’ update during concurrent operation.

The central observation is that high pro-node fanout in a FSM encoding one character in each state leads to high memory requirements to store the table. This can be drastically reduced by mapping such a state machine into 8 FSM, one per character bit, and having an additional step where the partial results are merged. The authors propose additional improvements to this basic idea consisting in grouping the rules and assigning them to different tiles, in order to further diminish the size of the required FSMs and match vectors.

I think that the idea in the paper is novel and interesting. The analysis is clear and convincing, and the choices concretely motivated through practical calculation and referring to the SNORT database.

I guess that previous solutions compress the FSM observing that some nodes have higher fanout and some a more limited one (so 256 is really an upper limit). However, this solution proves to better cover the worse case.

I appreciated the fact that the problem of concurrently updating the database, normally neglected in other papers I read on this topic, is also addressed.
Overall, my view of this paper is positive, both for the idea and the way to present and argue it.