Using decision trees to improve signature-based intrusion detection


Christopher Kruegel and Thomas Toth



Summary:


This paper introduces the use of decision trees for signature-based intrusion detection systems (IDSs). Signature-based systems are designed to identify known suspicious behaviors and are commonly used as they produce few false alarms (when compared to anomaly-based systems). A signature-based IDS compares an input to a set of predefined signatures (or rules) and reports an alert if there are any matches.

The approach used in this paper is to cluster the rules in a decision tree in order to reduce the number of rules that need to be compared. A variant of the ID3 algorithm is used in the construction of the decision tree in an effort to reduce the size of the tree. The authors implement the technique as a patch for Snort and compare the processing times for Snort with and without the patch.


Critique:


This paper fails to identify the nature of the underlying data and tends to focus on the header, not the content, of packets. (An example in the paper actually uses the source address as a feature in the decision tree, even though this address is unlikely to be of any use when considering malicious activities.) When header information is being compared, packet classification methods are more advanced than the method presented here. Additionally, this paper fails to cite HiCuts, a packet classifier that utilizes decision trees that are very similar to the trees presented in this paper.

When constructing the decision tree, information gain (based on entropy) is used to select the next feature to branch upon. The authors fail to motivate the usefulness of this metric in this context.

In the experimental results, the memory consumption for Snort with decision trees is plotted, however, it would be useful to have a plot of the consumption of Snort without the decision trees also. Moreover, for the tested system with 1581 signatures, the memory requirements are close to 250 MB. Packet classification techniques require substantially less memory.

The authors emphasize the importance of reducing the depth of the decision tree and state that an optimal tree would consist of only two levels. This objective ignores the importance of minimizing the branching factor, especially for nodes high in the tree. For instance, if a feature can assume a large number of values, it may be undesirable to place it at the root node where it will have to be evaluated for every packet.

Finally, the authors report a constant time overhead for comparing IPv4 addresses and bitfield types, and a log n overhead time (where n is the number of rules) for comparing integers. This is inconsistent as integers are simply bitfields themselves.



These points were made by CS679 participants and summarized by Sharlee Climer