CSE 770 Paper Review

Reviewer: Charlie Wiseman
Date: 12-8-2005

How would you rate this paper, relative to others we have read? top 50%, but not top 25%

How would you rate your kowledge of the topic of this paper? novice

What problem or issue does the paper address? Why is it important?

This paper presents a string matching architecture and associated algorithm for use in network intrusion detection and prevention systems. As line speeds increase, more efficient means of finding strings in network packets are needed to keep these systems from reducing overall throughput.

What are the main contributions of the paper and why are they important?

The biggest contribution is the newly developed architecture and algorithm that supposedly is 10 times 'more efficient' than existing techniques.

How significant are these contributions relative to previous work?

Both the architecture and the algorithm are (to my limited knowledge) worthwhile contributions. In particular, the tiled rule module (while vaguely reminiscent of a CAM) design has much potential, and their idea of translating rule sets into many small state machines seems to be applicable to other areas in addition to being very useful in the IDS context.

Give detailed comments justifying your view of the paper.

The presented algorithm is the most interesting part of this paper (to me). Using the Aho-Corasick algorithm isn't a huge leap in this instance, but then applying what is basically a formal language subset construction to break up the larger state machine into many independent smaller machines is a great idea. Then, the developed architecture just sort of falls into place to support this algorithm. All in all, it seems like a very interesting and effective approach.

There are some questions, though. I suspect that if I knew more about the subject, I wouldn't have to ask these questions. First, one big selling point is that the entire system can fit on-chip. It's not entirely clear to me what they mean by this; are they talking about putting this on NICs, or network processors? In either case, they use the memory as many tiled blocks. I also don't know if this is a reasonable thing to do or not (how exactly do you do this efficiently?). Second, they state that they are only concerned with throughput and not latency. This is fine as a base, but what kind of latency is introduced by this system? The anwser is important if they expect this system to be used. Finally, the efficiency metrics aren't particularly clear to me. What exactly does characters/millimeter^2 mean?

This leads into the biggest problem I have with this paper. In their analysis, they discuss 'practical' issues and have a few charts showing how their design fares against others. Where does this information come from? They say that the current Snort rules set is used, but they don't clarify how they obtain their perfomance numbers. Is it simulation, or is it an actual implementation on real hardware? There are no details, which means that the results are hard to clearly interpret. There are a few other generally distracting problems with the paper. There are enough grammatical errors to be noticed ('is' instead of 'are', etc), and they often repeat information nearly word for word from other parts of the paper (primarily in-paragraph descriptions of figures often are identical to figure captions). Also, just as a note, in figure 3 they show a FSA that is supposed to match "he","she","his","hers", but it also matches "her", which I assume to be a mistake (D-state 8 shouldn't be an accepting state).