论文标题
简单的集素描
Simple Set Sketching
论文作者
论文摘要
想象一下在哈希表中处理碰撞,通过在每个单元格中存储在每个单元格中的钥匙集的位置,或者在那里碰撞。这似乎是一个可怕的想法:对于$αn$键和$ n $ buccets,$α$是恒定的,我们希望由于碰撞而无法恢复键的恒定部分。 我们表明,如果这种碰撞分辨率策略独立重复了三次,则情况会逆转:如果$α$低于$ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \左右,那么我们可以以很高的可能性在线性时间内恢复所有插入的键。 即使我们对数据结构的描述很简单,它的分析也不平淡。我们的方法可以看作是Eppstein和Goodrich的可逆花过滤器(IBF)的变体。虽然IBF涉及每个存储桶的显式校验和确定存储桶是否存储一个键,但我们利用了商标的想法,即钥匙的某些位隐含在存储的位置中。我们让这些用作隐式校验和。这些位还不足以确保不会发生任何错误,而主要的技术挑战是表明解码可以从这些错误中恢复。
Imagine handling collisions in a hash table by storing, in each cell, the bit-wise exclusive-or of the set of keys hashing there. This appears to be a terrible idea: For $αn$ keys and $n$ buckets, where $α$ is constant, we expect that a constant fraction of the keys will be unrecoverable due to collisions. We show that if this collision resolution strategy is repeated three times independently the situation reverses: If $α$ is below a threshold of $\approx 0.81$ then we can recover the set of all inserted keys in linear time with high probability. Even though the description of our data structure is simple, its analysis is nontrivial. Our approach can be seen as a variant of the Invertible Bloom Filter (IBF) of Eppstein and Goodrich. While IBFs involve an explicit checksum per bucket to decide whether the bucket stores a single key, we exploit the idea of quotienting, namely that some bits of the key are implicit in the location where it is stored. We let those serve as an implicit checksum. These bits are not quite enough to ensure that no errors occur and the main technical challenge is to show that decoding can recover from these errors.