论文标题

中间值线性化性:定量正确性标准

Intermediate Value Linearizability: A Quantitative Correctness Criterion

论文作者

Rinberg, Arik, Keidar, Idit

论文摘要

大数据处理系统通常采用批处理的更新和数据草图来估计大数据的某些属性。例如,计数素描草图近似于数据流中元素发生的频率,以及分批计数的批量计数事件。本文着重于此类对象的并发实现的正确性。具体而言,我们考虑定量对象,其返回值来自一个完全有序的域,重点是$(e,d)$ - 有界对象,这些对象估算了数量,其中最多$ e $的数量,概率至少为$ 1- d $。 事实上的并发对象的正确性标准是线性化的。在线性化的情况下,当读取重叠更新时,它必须在更新之前或之后返回对象的值。例如,考虑一个单个批处理的增量操作,该操作计算三个新事件,将批处理计数器的价值从$ 7 $到$ 10 $碰到。在可线化的计数器实现中,重叠的读取必须返回其中之一。但是,我们观察到,在典型的用例中,任何中间值也可以接受。为了捕获这种自由度,我们提出了中间值线性化性(IVL),这是一种新的正确性标准,可放宽线性化的性能以允许返回中间值,例如上面的示例中的$ 8 $。粗略地说,IVL允许阅读返回两个在线性化性下合法的返回值之间的值。 IVL的一个关键功能是,$(e,d)$ - 有限对象的并发IVL实现保留$(e,d)$ - 有限。为了说明此结果的功能,我们给出了$(e,d)$ - 有限的计数尺寸草图,即IVL(尽管不是可线化)。最后,我们证明IVL允许与可线化的实现更便宜。

Big data processing systems often employ batched updates and data sketches to estimate certain properties of large data. For example, a CountMin sketch approximates the frequencies at which elements occur in a data stream, and a batched counter counts events in batches. This paper focuses on the correctness of concurrent implementations of such objects. Specifically, we consider quantitative objects, whose return values are from a totally ordered domain, with an emphasis on $(e,d)$-bounded objects that estimate a quantity with an error of at most $e$ with probability at least $1 - d$. The de facto correctness criterion for concurrent objects is linearizability. Under linearizability, when a read overlaps an update, it must return the object's value either before the update or after it. Consider, for example, a single batched increment operation that counts three new events, bumping a batched counter's value from $7$ to $10$. In a linearizable implementation of the counter, an overlapping read must return one of these. We observe, however, that in typical use cases, any intermediate value would also be acceptable. To capture this degree of freedom, we propose Intermediate Value Linearizability (IVL), a new correctness criterion that relaxes linearizability to allow returning intermediate values, for instance $8$ in the example above. Roughly speaking, IVL allows reads to return any value that is bounded between two return values that are legal under linearizability. A key feature of IVL is that concurrent IVL implementations of $(e,d)$-bounded objects remain $(e,d)$-bounded. To illustrate the power of this result, we give a straightforward and efficient concurrent implementation of an $(e, d)$-bounded CountMin sketch, which is IVL (albeit not linearizable). Finally, we show that IVL allows for inherently cheaper implementations than linearizable ones.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源