Publications on Bloom FiltersReconfigurable Network Group (In Reverse Chronological Order)
(Also available as BIBTeX format)
Abstract: High-speed packet content inspection and filtering devices rely on a fast multi-pattern matching algorithm which is used to detect predefined keywords or signatures in the packets. Multi-pattern matching is known to require intensive memory accesses and is often a performance bottleneck. Hence specialized hardware-accelerated algorithms are required for line-speed packet processing.
We present hardware-implementable pattern matching algorithm for content filtering applications, which is scalable in terms of speed, the number of patterns and the pattern length. Our algorithm is based on a memory efficient multi-hashing data structure called Bloom filter. We use embedded on-chip memory blocks in FPGA/VLSI chips to construct Bloom filters which can suppress a large fraction of memory accesses and speed up string matching. Based on this concept, we first present a simple algorithm which can scan for several thousand short (up to 16 bytes) patterns at multi-gigabit per second speeds with a moderately small amount of embedded memory and a few mega bytes of external memory. Furthermore, we modify this algorithm to be able to handle arbitrarily large strings at the cost of a little more on-chip memory. We demonstrate the merit of our algorithm through theoretical analysis and simulations performed on Snort's string set.
Abstract: Ternary Content Addressable Memory (TCAM), although widely used for general packet classification, is an expensive and high power-consuming device. Algorithmic solutions which rely on commodity memory chips are relatively inexpensive and power-efficient but have not been able to match the generality and performance of TCAMs. Therefore, the development of fast and power-efficient algorithmic packet classification techniques continues to be a research subject.
In this paper we propose a new approach to packet classification which combines architectural and algorithmic tech- niques. Our starting point is the well-known crossproduct algorithm which is fast but has significant memory overhead due to the extra rules needed to represent the crossproducts. We show how to modify the crossproduct method in a way that drastically reduces the memory requirement without compromising on performance. Unnecessary accesses to the off-chip memory are avoided by filtering them through on- chip Bloom filters. For packets that match p rules in a rule set, our algorithm requires just 4+p+e. independent memory accesses to return all matching rules, where . e << 1 is a small constant that depends on the false positive rate of the Bloom filters. Using two commodity SRAM chips, a throughput of 38 Million packets per second can be achieved. For rule set sizes ranging from a few hundred to several thousand filters, the average rule set expansion factor attributable to the al- gorithm is just 1.2 to 1.4. The average memory consumption per rule is 32 to 45 bytes.
Abstract: Network Intrusion Detection System (NIDS) performs deep inspections on the packet payload to identify, deter and contain the malicious attacks over the Internet. It needs to perform exact matching on multi-pattern signatures in real time. In this paper we introduce an efficient data structure called Extended Bloom Filter (EBF) and the corresponding algorithm to perform the multi-pattern signature matching. We also present a technique to support long signature matching so that we need only to maintain a limited number of supported signature lengths for the EBFs. We show that at reasonable hardware cost we can achieve very fast and almost time-deterministic exact matching for thousands of signatures. The architecture takes the advantages of embedded multi-port memories in FPGAs and can be used to build a full-featured hardware-based NIDS.
Abstract: High-speed packet content inspection and filtering devices rely on a fast multi-pattern matching algorithm which is used to detect predefined keywords or signatures in the packets. Multi-pattern matching is known to require intensive memory accesses and is often a performance bottleneck. Hence specialized hardware-accelerated algorithms are being developed for line-speed packet processing. While several pattern matching algorithms have already been developed for such applications, we find that most of them suffer from scalability issues. To support a large number of patterns, the throughput is compromised or vice versa.
We present a hardware-implementable pattern matching algorithm for content filtering applications, which is scalable in terms of speed, the number of patterns and the pattern length. We modify the classic Aho-Corasick algorithm to consider multiple characters at a time for higher throughput. Furthermore, we suppress a large fraction of memory accesses by using Bloom filters implemented with a small amount of on-chip memory. The resulting algorithm can support matching of several thousands of patterns at more than 10 Gbps with the help of a less than 50 KBytes of embedded memory and a few megabytes of external SRAM. We demonstrate the merit of our algorithm through theoretical analysis and simulations performed on Snort's string set.
Abstract: Hash table is used as one of the fundamental modules in several network processing algorithms and applications such as route lookup, packet classification, per-flow state management and network monitoring. These applications, which typically form components of data-path in a high-speed router, must process and forward packets with little or no buffer in order to maintain the wire-speed throughout. A poorly designed hash table can critically affect the worstcase throughput due to multiple memory accesses required for each lookup. Hence, high throughput requirement in turn underscores the need for a hash table having good and more predictable worstcase lookup performance. While most of the existing hash table based packet processing algorithms rely on the assumption that hash table lookup needs constant time, very little discussion is provided on the underlying engineering considerations to achieve this performance.
We present a novel hash table data structure and lookup algorithm which improves the performance of a naive hash table by providing better bounds on the hash collisions and memory accesses per search. Our algorithm extends the multiple-hashing Bloom Filter data structure to support exact matches. We contrive our hash table architecture by coupling our algorithm with the latest advances in embedded memory technology. Through theoretical analysis and simulations we show that our algorithm is significantly faster for practical purposes than the naive hash table using the same amount of memory, hence it can support better throughput for router applications based on hash tables.
Abstract: Network Intrusion Detection and Prevention Systems (IDPS) use string matching to scan Internet packets for malicious content. Bloom filters offer a mechanism to search for a large number of strings efficiently and concurrently when implemented with Field Programmable Gate Array (FPGA) technology. A string matching circuit has been implemented within the FPX platform using Bloom filters. Using 155 block RAMs on a single Xilinx VirtexE 2000 FPGA, the circuit scans for 35,475 unique signatures.
Abstract: Because conventional software-based packet inspection algorithms have not kept pace with high-speed networks, interest has turned to using hardware to process network data quickly. String scanning with Bloom filters can scan entire packet payloads for predifined signatures at multi-Gigabit-per-second line speeds.
Abstract We introduce the first algorithm that we are aware of to employ Bloom filters for Longest Prefix Matching (LPM). The algorithm performs parallel queries on Bloom filters, an efficient data structure for membership queries, in order to determine address prefix membership in sets of prefixes sorted by prefix length. We show that use of this algorithm for Internet Protocol (IP) routing lookups results in a search engine providing better performance and scalability than TCAM-based approaches. The key feature of our technique is that the performance, as determined by the number of dependent memory accesses per lookup, can be held constant for longer address lengths or additional unique address prefix lengths in the forwarding table given that memory resources scale linearly with the number of prefixes in the forwarding table. Our approach is equally attractive for Internet Protocol Version 6 (IPv6) which uses 128-bit destination addresses, four times longer than IPv4. We present a basic version of our approach along with optimizations leveraging previous advances in LPM algorithms. We also report results of performance simulations of our sys snapshots of IPv4 BGP tables and extend the results to IPv6. Using less than 2Mb of embedded RAM and a commodity SRAM device, our technique achieves average performance of one hash probe per lo and a worst case of two hash probes and one array access per lookup.
Abstract Recent advances in network packet processing focus on payload inspection for applications that include contentbased billing, layer-7 switching and Internet security. Most of the applications in this family need to search for predefined signatures in the packet payload. Hence an important building block of these processors is string matching infrastructure. Since conventional software-based algorithms for string matching have not kept pace with high network speeds, specialized high-speed, hardware-based solutions are needed. We describe a technique based on Bloom filters for detecting predefined signatures (a string of bytes) in the packet payload. A Bloom filter is a data structure for representing a set of strings in order to support membership queries. We use hardware Bloom filters to isolate all packets that potentially contain predefined signatures. Another independent process eliminates false positives produced by Bloom filters.
We outline our approach for string matching at line speeds and present the performance analysis. Finally, we report the results for a prototype implementation of this system on the FPX platform. Our analysis shows that with the state-of-the-art FPGAs, a set of 10,000 strings can be scanned in the network data at the line speed of OC48 (2.4 Gbps).