Reviewer: Mike Wilson
Date: 11-17-2005
How would you rate this paper, relative to others we have read? top 10%
How would you rate your kowledge of the topic of this paper? novice
What problem or issue does the paper address? Why is it important?
The paper presents an algorithm for splitting data stored in TCAMs into different groups. This split reduces the power and memory needs for multi-match searches in TCAMs (TCAMs are normally used for first-match searches).
What are the main contributions of the paper and why are they important?
The authors adapt a known approximation algorithm for solving the maximum satisfiability problem (known to be NP-hard) by converting the hypergraph cut problem to a satisfiability problem. (It is well-known that any NP-Complete problem can be converted to any other in polynomial time.) By piggy-backing the guarantees for the known algorithm, SSA can make the same guarantees about the split solution.
The results of this split can be used to organize the data in TCAM groups such that intersections of searches are minimized; that is, for a search with multiple matches, there are "few" multi-matches from any given TCAM group. Since each multi-match will require additional lookups to get the second and subsequent results, this reduces the total search time as well as reducing the power consumption of searching the entire TCAM space repeatedly (subsequent searches are restricted to only those TCAMs where necessary). Splitting intersections also reduces the number of extra entries required.
Expected end results: reduced power consumption, reduced memory consumption, reduced lookup time.
How significant are these contributions relative to previous work?
Note: Prior work for this space should be restricted solely to the multi-match problem.
No prior work meets the same needs. The MUD scheme adds some memory, but requires O(k) searches of the whole space where k is the number of matches. Geometric Intersection requires each search intersection be inserted as a new filter, which means that the number of filters increases dramatically - O(N^F), N filters with F fields. If the rate of intersection is high, this is not practical, although it's fine for low-intersection sets and gives O(1) performance. Finally, bitvector approaches work well if the entire system is designed to handle parallel search result processing. However, this is rarely the case. Thus, SSA is most efficient if (1) searches are frequent, (2) intersections are frequent, (3) performance can be worse than O(1), and (4) the results are to be processed by generic hardware.
Beyond the cases where SSA is ideal, it should perform adequately under conditions appropriate for other techniques. Thus, it is a superior general purpose solution, too. This makes SSA's benefits much more significant.
Give detailed comments justifying your view of the paper.
I was heartened that the authors considered the cost of updating the data structure. In many papers of this nature, the authors tend to ignore the penaltiesof changing the data structures. It should be noted that the update costs are nearly 1 for the common cases, although as much as 20 in worst-case.
The results bear out the plan. In just splitting to 2 filter sets, SSA removes well over 95% of intersections. For a 4-way split, the reduction is nearly 100%. I would be interested in seeing the results of a comparison to the perfect split, since SSA is only guaranteed to remove 50% of the splits as a minimum. How closely does SSA approximate the optimal solution when applied to real-world data?
As expected, this dramatically increases lookup speed, yielding a worst-case performance around 20 Gbps at 4ns/TCAM lookup. By comparison, the MUD solution has a worst-case performance around 6.66 Gbps. While neither solution reaches actual core line rates, it is clear which one is closer. (SSA is triple the speed of MUD on current SNORT filters.)
SNORT filters aren't the only filters in real-world use. The authors also analyze the solution as applied to a synthetic database of filters. The performance here is slightly worse, as expected for a larger data set. Nonetheless, the results are convincing, and it's clear that SSA is a worthwhile addition.
The general plan of taking a known poly-time approximation algorithm for an NP-Complete problem and adapting it to another NP-Complete problem is usually only used for theoretical results. The work here is clear and lucid, and the authors performed an unusually complete analysis of the results. I would definitely recommend including this paper in a journal if I were on the review committee.