任何K算法，用于列举排名的答案

论文标题

任何K算法，用于列举排名的答案

Any-k Algorithms for Enumerating Ranked Answers to Conjunctive Queries

论文作者

Tziavelis, Nikolaos, Gatterbauer, Wolfgang, Riedewald, Mirek

论文摘要

我们研究了连接性查询（CQ）的排名枚举，其中答案是由给定的排名函数排序的（例如，SQL中的子句订单）。我们开发了“ Any-K”算法，该算法在不知道所需答案的数字k的情况下，通过仔细订购了中间元组的计算并避免对加入答案的实现，直到需要它们，将排名降低到加入。为此，排名函数需要遵守特定类型的单调性。支持的排名函数包括常见的重量总和，其中的答案是通过输入培训权重的总和以及任何交换性选择性二元组进行比较。我们的结果扩展了一个众所周知的不合格的二分法，该二分法只有自由结合CQ是可行的（在某些硬度假设和无自结合的CQ下）。对于此类的查询和N表示输入的大小，我们对KTH CQ答案的排名枚举方法的数据复杂性是O（N+Klogk），它只是比对数因子比O（n+K）慢的因素慢。我们工作的核心见解是，CQS的排名枚举与加权DAG中的动态编程和路径枚举的基本任务密切相关。我们发现了一个以前未知的权衡：当返回答案的数量很少时，当它们数量较大时，一个Any-K算法的复杂性较低。在更严格的单调性能下消除了这种权衡，我们为一种新颖的算法定义并利用了一种渐近地主导所有以前已知的替代方案的新算法，包括Eppstein的算法，用于枚举量。我们在一项实验研究中凭经验证明了我们的理论分析的发现，该研究强调了我们方法的优越性，而不是现有数据库系统所遵循的联接方法。

We study ranked enumeration for Conjunctive Queries (CQs) where the answers are ordered by a given ranking function (e.g., an ORDER BY clause in SQL). We develop "any-k" algorithms, which, without knowing the number k of desired answers, push down the ranking into joins by carefully ordering the computation of intermediate tuples and avoiding materialization of join answers until they are needed. For this to be possible, the ranking function needs to obey a particular type of monotonicity. Supported ranking functions include the common sum-of-weights, where answers are compared by the sum of input-tuple weights, as well as any commutative selective dioid. Our results extend a well-known unranked-enumeration dichotomy, which states that only free-connex CQs are tractable (under certain hardness hypotheses and for CQs without self-joins). For this class of queries and with n denoting the size of the input, the data complexity of our ranked enumeration approach for the time to the kth CQ answer is O(n+klogk), which is only a logarithmic factor slower than the O(n+k) unranked-enumeration guarantee. A core insight of our work is that ranked enumeration for CQs is closely related to Dynamic Programming and the fundamental task of path enumeration in a weighted DAG. We uncover a previously unknown tradeoff: one any-k algorithm has lower complexity when the number of returned answers is small, the other when their number is large. This tradeoff is eliminated under a stricter monotonicity property that we define and exploit for a novel algorithm that asymptotically dominates all previously known alternatives, including Eppstein's algorithm for sum-of-weights path enumeration. We empirically demonstrate the findings of our theoretical analysis in an experimental study that highlights the superiority of our approach over the join-then-rank approach that existing database systems follow.

下载PDF全文

下载文献需遵守相关版权规定

论文标题