Using decision trees to
improve signature-based intrusion detection
Christopher Kruegel and Thomas Toth
Summary:
This paper introduces the use of decision trees for signature-based intrusion
detection systems (IDSs). Signature-based systems are designed to
identify known suspicious behaviors and are commonly used as they
produce few false alarms (when compared to anomaly-based systems). A
signature-based
IDS compares an input to a set of predefined signatures (or rules)
and reports an
alert if there are any matches.
The approach used in this paper is to cluster the
rules in a decision tree in order to reduce the number of
rules that need to be compared. A variant of the ID3 algorithm
is used in the construction of the decision tree in an effort to
reduce the size of the tree. The authors implement the
technique as a patch for Snort and compare the processing times for
Snort with and without the patch.
Critique:
This paper fails to identify the nature of the underlying data and
tends to focus on the header, not the content, of packets. (An example
in the paper actually uses the source address as a feature in the decision
tree, even though this address is unlikely to be of any use when
considering malicious activities.) When header information is being
compared, packet classification methods are more advanced than the
method presented here. Additionally, this paper fails
to cite HiCuts, a packet classifier that utilizes decision trees that
are very similar to the trees presented in this paper.
When constructing the decision tree, information gain (based on
entropy) is used to select the next feature to branch upon. The
authors fail to motivate the usefulness of this metric in this
context.
In the experimental results, the memory consumption for Snort with
decision trees is plotted, however, it would be useful to have a plot
of the consumption of Snort without the decision trees also.
Moreover, for the tested system with 1581 signatures, the memory
requirements are close to 250 MB. Packet classification techniques
require substantially less memory.
The authors emphasize the importance of reducing the depth of the
decision tree and state that an optimal tree would consist of only two
levels. This objective ignores the importance of minimizing the
branching factor, especially for nodes high in the tree. For
instance, if a feature can assume a large number of values, it may be
undesirable to place it at the root node where it will have to be
evaluated for every packet.
Finally, the authors report a constant time overhead for comparing
IPv4 addresses and bitfield types, and a log n overhead time (where n is the number
of rules) for comparing integers. This is inconsistent as integers
are simply bitfields themselves.
These points were made by CS679 participants and summarized by
Sharlee Climer