Review on Packet Classification Using Multidimensional Cutting

The paper abstracts the packet classification as a geometric point-location  problem: each rule forms a box in a k-dimensional space and each packet needed to be classified is a point in this space. The problem is to find the least cost box which contains this point. The problem is solved by recursively cutting the space to some smaller regions, thus less boxes resides in each sub-region. The corresponding data structure is a decision tree where each branch addresses an sub-region. The lookups take the packet header information to walk the tree from root to leaf and eventually figure out the matched rule. The algorithm is proved be much more efficient than HiCuts and other algorithms such as EGT-PC and ABV in terms of worst case number of memory accesses and data structure size. It is slightly slower than RFC but uses much less memories.

The algorithm can be seemed as an improvement to the HiCuts algorithms. The similarities are:

1. Recursively cut the region to a set of equal sized sub-regions. Based on the local optimal decision, the algorithms need to choose which dimension(s) to cut and how many cuttings to make. The decision is from heuristics and memory/speed tradeoff and not necessarily turns out to be a global optimized decision in terms of memory consumption and tree depth.

2. As another tradeoff, maintain a short list of rules in each leaf node instead of keeping cutting the corresponding sub-region. The linear searching at the leaves is very useful to reduce the storage, and potentially also improve the searching speeds. This is because some rules are so close in some sense that they needs unnecessarily high cost to separate them by recursively cuttings.

3. They both use two heuristics to further reduce the memory usages: Node merging and Rule Overlap.

Though, HyperCuts is different from HiCuts in some ways:

1. Cutting multiple dimensions rather than one in each step so that each tree node represents a Hypercube rather than a Hyperplane as in HiCuts. This slightly increases the decision information need to be kept in each node but greatly decrease the tree size and tree depth. Similar as the selection of number cuts, the selection of dimensions is heuristic.

2. Pulling rules upwards. The rules shared by all child nodes are stored in parent nodes, so when cutting the child nodes, these rules are not considered again. This refinements can save the storage (and also possibly contribute to reduce the tree depth)

3. Region compaction. If all rules in a region actually covered by a smaller rectangular sub-region, the region is shrunk to the sub-region. This can make the following cuttings more efficient.  However, we need keep track the new region's boundary to correctly direct the following cuts.

The reason that the paper doesn't compare HyperCuts with TCAM solutions is that TCAM has its own problems: It is high cost, low density, high power consumption and can't directly represent arbitrary ranges. If we can meet the worst case requirement by using algorithmic solutions, why use TCAMs which cost you ten (or more) times more?

There are several problems we need clarifications:

1.Evaluation metrics: The evaluation results are hard to interpret. For the memory usage, what's the unit of the number? Does that count the storage for rules themselves. Since we know there are many rule duplications, this may result in a large memory consumption. I don't think using pointers for rules is a good solution to save memory, because this will double  the number of memory access per rule. We'd like to see the average number of bytes used per rules. That is a more reasonable metric. There are also some missing messages to help us understand the number of memory accesses: what's the leaf size? Does each node access takes to memory accesses? Why don't just tell us the tree depth?

2. Effectiveness of some heuristics: Generally, from the evaluation results, we can't tell which heuristics contributes to the performance improvement more. The effectiveness of some heuristics are actually not evaluated and justified. The authors didn't explicitly show weather or not they utilized these heuristics when doing the simulation and comparison (at least not in their pseudo code for algorithm description). We need quantitatively show the algorithm progress when we applied these heuristics incrementally.

3. Not working well in edge routers: only two fields (source IP and destination IP) are defined in these tables and most prefixes are very specific so there are not enough freedom to perform multidimensional cuttings.

4. Incremental updates: Like HiCuts, incremental updates might not be easily supported in HyperCuts. A rule may be duplicated in many subtrees. We need keep tracking all copies of a rule in all possible branches when trying to insert or delete one rule. When doing the insertion, if one leaf node exceeds its list capacity, we need cut this node. After insertion and deletions, at each internal node, the number of cuttings and dimensions chosen to cut might be no longer the best based on the criteria. Eventually we need to rebuild the whole data structure from scratch otherwise the performance will suffer a degradation.  So this algorithm fits for some relatively static rule sets only. What's more, if we use some heuristics in the implementation, the incremental updating can be more difficult or even impossible. For example, the region compaction changes the boundary of a regions. If the new rule need expand this region, the whole subtree form this point needs to be rebuilt.


Haoyu Song