论文标题
年龄分门的布鲁姆过滤器
Age-Partitioned Bloom Filters
论文作者
论文摘要
Bloom过滤器(BF)广泛用于一组元素上的大约成员查询。 BF变体允许拆卸,一组无界尺寸或在无界流上查询滑动窗口。但是,在最后一个情况下,最佳当前方法是基于字典(例如,基于杜鹃过滤器或微调),似乎基于BF的方法与基于字典的方法永远不会具有竞争力。在本文中,我们介绍了年龄分门的Bloom过滤器,这是一种基于BF的滑动窗口重复检测方法,它不仅在时间复杂性上具有竞争力,而且比当前基于字典的方法(例如沼泽)具有更好的空间用法,以某些适度的松弛为代价。与基于字典的方法不同,APBF保留了BF简单性,对于基于硬件的实现很重要,并且可以集成已知的改进,例如Double Hashing或blocking。我们提出了一个年龄分配的封闭开花过滤器变体,该变体可以使用每个插入的2-3个缓存线访问,每个查询约2-4个,即使是高精度过滤器也是如此。
Bloom filters (BF) are widely used for approximate membership queries over a set of elements. BF variants allow removals, sets of unbounded size or querying a sliding window over an unbounded stream. However, for this last case the best current approaches are dictionary based (e.g., based on Cuckoo Filters or TinyTable), and it may seem that BF-based approaches will never be competitive to dictionary-based ones. In this paper we present Age-Partitioned Bloom Filters, a BF-based approach for duplicate detection in sliding windows that not only is competitive in time-complexity, but has better space usage than current dictionary-based approaches (e.g., SWAMP), at the cost of some moderate slack. APBFs retain the BF simplicity, unlike dictionary-based approaches, important for hardware-based implementations, and can integrate known improvements such as double hashing or blocking. We present an Age-Partitioned Blocked Bloom Filter variant which can operate with 2-3 cache-line accesses per insertion and around 2-4 per query, even for high accuracy filters.