Cristian Estan, Stefan Savage, George Varghese
SIGCOMM 2003
Summary and critique by Ed Spitznagel:
SUMMARY
To simplify the daunting task of managing IP networks, the authors describe a technique for analyzing IP-based traffic via automatic multidimensional traffic clustering. This technique performs clustering automatically, rather than relying on (possibly useless) assumptions about how to distinguish interesting flows. It uses multiple dimensions in clustering, which is far more useful than a single dimension (which might indicate that a particular server is popular, or that a specific port value is popular, but it won't indicate which server produces traffic on which port) And, it removes redundant data from the reports it generates, to ensure the report is useful rather than overwhelming.
The paper describes the algorithms in detail, describes a prototype implementation called AutoFocus, and provides examples of results from the prototype. Further details are included in the tech report version.
CRITIQUE
The paper is well-written, and the techniques described in it are very good at what they do. However, it is important to understand what these techniques do, and what they do not do. The authors never overstate the capabilities of their work, but casual readers of the paper may misinterpret what is written.
For example, the paper mentions detection of Denial-of-Service (DoS) attacks. The methods described in the paper would indeed detect the traffic flows involved in bandwidth-intensive DoS attacks; they would not, however, be able to tell automatically whether such traffic is in fact an attack or whether such traffic is legitimate. Furthermore, many DoS attacks do not have substantial bandwidth requirements, and could thus go undetected by this tool. SYN flood attacks, for example, can be effective with just 1000 packets per second.
Similarly, the paper mentions detecting the spread of Internet worms. The methods would indeed detect the traffic flows involved in the spread of a bandwidth-intensive worm. But the tool is unable to tell whether these flows are legitimate or not. And, it is possible for worms to spread in ways that are not bandwidth-intensive (e.g. passively detecting targets instead of actively scanning)
Finally, the timescales involved are worth noting. The tool requires at least a few minutes (especially when computing multidimensional clusters) and thus it does not produce a real-time display or detection of any sort. It still has much value to network administrators, but, it is important to understand what it is and is not capable of.