Discussion report on: Gigabit Rate Packet Pattern-Matching Using TCAM

By Fang Yu, Randy H. Katz, T. V. Lakshman, ICNP 2004

 

Topic:

A TCAM based, multiple-pattern matching algorithm for layer 7 (content) pattern matching problems at gigabit rate. Since signatures are more complex for intrusion detection than some of the other layer 7 problems such as HTTP load balancing, email SPAM filtering etc., the paper is geared towards that.

 

Discussion:

NOTE: Since there are many limitations to point out in this paper, I bring out points at relevant places rather than summarizing the paper first.

 

The paper starts out well. Authors briefly point out problems with existing solutions but some in the class raised concerns that they do not cover important results from recent research but specifics were not discussed.

 

Memory access is often a limiting factor in intrusion detection systems. The proposed solution is based on TCAM but TCAMs are known for being expensive and power consuming, their use of it becomes one of the things to analyze first. The TCAM size that they used is 240KB. But since TCAMs of 2MB are common nowadays, 240KB is not bad. These large(r) TCAMs cost ~$400-500 though. While this may not be much compared to the cost of good intrusion detection systems, content matching is not the only component of such systems. But the thing to note here is that the 240KB size that they mention is only for their simulation of CalmAV virus database which only contains simple patterns.

 

Along with TCAMs, their solution involves SRAM access. For every TCAM hit, they access the combined pattern table stored in memory once and matching table as many times as there are entries in the PHL (Partial Hit List).

 

As far as SRAM requirements, for their rather limited simulation using only ClamAV patterns, we are left to infer the SRAM size to be < 10MB. Unfortunately, they do not discuss this for SNORT database that involves more complex correlated patterns. For long correlated patterns, the PHL is grows as the entries can not be removed after 4 bytes. That means many more memory accesses because every TCAM hit involves a memory access to the combined-pattern table and j number of accesses to matching table where j is the number of entries in the PHL. To address this, authors recommend restricting the inter sub-pattern distance in correlated patters. Though it sounds too restrictive, 128B for pattern length is pretty large and so their recommendation may not be too bad. But what is not good about their memory requirements is their high per byte memory requirement.

 

Unfortunately, the large sub-pattern mentioned above is not the only cause of increased memory access. Negation patterns can cause them too – suppose we are looking for “user” followed by say “login” after 50 bytes, a malicious attacker can say “user” repeatedly, sequentially, thus forcing us to keep the PHL entries for 50 bytes and we already saw the bad effect of more entries in PHL.

 

Although this and the paper makes us believe that their system is vulnerable to malicious attacks, simulation on SNORT database might have shown a different result but unfortunately the authors don’t report that. But to be fair to the authors, currently, all intrusion detection systems all vulnerable to this and solution using bloom filter is even more vulnerable.

 

Finally, in the result for their simulation on SNORT database, they do not mention the window size. It looks like they used the smallest size of 20 and if that is the case, the result would be bad if we used a bigger size of 200 mentioned in table 7.

 

In addition to all these, the authors have completed missed another important contributor to memory: authors say that their work is applicable to layer 7 but they never discuss flow inspection while layer 7 inspection requires treating flow as the entity and not packets. Flow inspection involves storing flow state information which will only add to memory requirement. Worse, matching patterns across packets, leave alone across flows, will further increate number of entries in PHL, increasing memory access more.

 

Authors highlight their ability to handle:

·        Simple patterns

o       Deterministic

o       Non-deterministic: case insensitive alphabets and wildcard byte

·        Complex/ composite patterns that include

o       Arbitrarily long patterns

o       Correlated patterns and

o       Negation

·        Regular expression like patterns.

 

Unfortunately, their discussion of correlated patterns is very limited. An eg. of a correlated pattern is P3 = P1 * P2 but authors don’t discuss long content between P1 and P2.

 

Since payload processing is often done along with header processing, the paper could have discussed considerations for integration with other such functionalities.

 

The good things about the paper are: their analysis of effect of TCAM width on real and synthetic traces for two pattern databases and the pipelining of TCAM and SRAM accesses.

 

As far as quality of the writing, they start out well but when it gets to subject matter, unfortunately, the writing is not so good. The worst part is when they explain the matching algorithm – there are many tables used but the use of their indices is not at all explained when that is fundamental to understand their matching algorithm. Similarly, there is also a problem with consistency of the names being used – for instance, they start out with the term ‘matching table’ but somewhere down the paper they switch to ‘mapping table’ without making any connection between them. Finally, they never explain “window size” though we can infer what it means.

 

In all, the idea presented in the paper is good but not fundamentally novel or groundbreaking. Not surprisingly, the class ranked it in the middle third.

 

Shobana