Reviewer: Mike Wilson
Date: 12-8-2005
How would you rate this paper, relative to others we have read? top 25%, but not top 10%
How would you rate your knowledge of the topic of this paper? familiar, but not expert
What problem or issue does the paper address? Why is it important?
The authors devise a scheme for increasing the throughput of parallel pattern matching in network streams via dedicated hardware. This problem is the basis of most IDS/IPS technology today. At the moment, no technology exists that can do IDS/IPS at 10 Gbps, the current research goal.
What are the main contributions of the paper and why are they important?
The authors present a novel scheme for doing bit-parallel state machine processing in the Aho-Corasick Algorithm. The authors claim throughput increase up to a maximum worst-case of 10.074 Gbps.
How significant are these contributions relative to previous work?
The contributions is immense, if it proves workable in practice. Prior efforts focused on software techniques for combining matching into a single search, or FPGA designs for search-string parallel, massively pipelined searching. Neither of these approaches is currently able to approach this speed. The one drawback of the bit-split technique is that it relies on custom hardware, not an off-the-shelf processor or a standard FPGA board.
Some caution should be used in examining these performance numbers, since the authors seem to have done their evaluations in CACTI. CACTI is a cache-level simulator, and may not be the best way to evaluate the relative performance of these techniques. Having had recent experience with on-paper designs for hardware schemes that proved to be fundamentally flawed, I take the results with a very large brick of salt.
Give detailed comments justifying your view of the paper.
To my knowledge, the scheme - rendering large state machines into simpler partitions that can be addressed via fewer bits - is completely novel. The elegeance is compelling: by reducing states, we reduce the size of the FSM. By reducing the size, we increase the speed - particularly as we can now store the FSM in much faster SRAM.
I was pleased with the dicussion of how the split is performed. This was very simple and easy to follow, which has been lacking in many papers we've reviewed.
One point that surprised me: the scheme could easily be extended to more parallel FSMs, but the authors don't even mention this. I don't think this would necessarily be a good thing, but it is straight-forward to rewrite the original patterns as 16-bit or 32-bit patterns, then split 16 or 32 ways instead of 8 (or 4, which the authors found optimal). This method (increasing token size in match patterns) is sometimes used as a way to increase match throughput in software techniques, although I don't have a citation handy.
The update portion of the paper is an oft-neglected issue that I'm glad to see presented. This portion of the scheme looks sound and simple, although a higher bit parallelism would better amortize the cost of the replacement module.
The theoretical partitioning analysis would have been better served with a few graphs. (The graphs presented actually cover the practical analysis, which contains overhead that should have been -- but wasn't -- included in the theory.) For example, the following is 5 minutes of work that makes the situation very clear, although my parameters are a bit different (1000 signatures, average length 12):
One other problem I had with this analysis is the off-hand comment that the snort rulebase has "over 1000" signatures. As of the most recent public release, snort has 3,544. This disparity seems a bit extreme.
The biggest problem with the paper is that the data came from simulation in CACTI. While CACTI is a wonderful tool for evaluating an architecture, it is not a good tool for evaluating a design. A complete implementation in some hardware description language is a much more adequate proof of concept. Without some evidence of proof of concept, I would not trust the design to be implementable. This is the only reason I don't consider this a top-grade paper.