论文标题

概率查询评估带有袋子语义的评估

Probabilistic Query Evaluation with Bag Semantics

论文作者

Grohe, Martin, Lindner, Peter, Standke, Christoph

论文摘要

我们研究了在袋子语义下评估概率数据库查询的复杂性。我们专注于自由加入自由结合性查询和概率数据库,在这些查询中,不同事实的发生是独立的,这是元素独立的概率数据库对袋子语义设置的自然概括。对于设定的语义,即使对于更一般的连接查询工会类别类别,该问题的数据复杂性也可以很好地理解:它是在多项式时间或#p-hard中,具体取决于查询(Dalvi&Suciu,JACM,JACM 2012)。 袋子概率数据库的合理通用模型可能具有无界的多重性。在这种情况下,概率数据库不再是有限的,需要仔细处理表示机制。此外,布尔查询的答案是(可能是所有)非负整数的概率分布,而不是{true,false}上的概率分布。因此,我们讨论了概率查询评估的两种口味:计算答案元组多重性的期望,并计算出答案中最多k时k tim k tim k times k tim k time k的概率。在表示系统上有轻度的技术假设,事实证明,即使是关于连接的查询工会,期望也很容易计算。对于查询答案概率,我们获得了多项式时间的溶解度与自由结合性查询的#p- hard度之间的二分法。

We study the complexity of evaluating queries on probabilistic databases under bag semantics. We focus on self-join free conjunctive queries, and probabilistic databases where occurrences of different facts are independent, which is the natural generalization of tuple-independent probabilistic databases to the bag semantics setting. For set semantics, the data complexity of this problem is well understood, even for the more general class of unions of conjunctive queries: it is either in polynomial time, or #P-hard, depending on the query (Dalvi & Suciu, JACM 2012). A reasonably general model of bag probabilistic databases may have unbounded multiplicities. In this case, the probabilistic database is no longer finite, and a careful treatment of representation mechanisms is required. Moreover, the answer to a Boolean query is a probability distribution over (possibly all) non-negative integers, rather than a probability distribution over { true, false }. Therefore, we discuss two flavors of probabilistic query evaluation: computing expectations of answer tuple multiplicities, and computing the probability that a tuple is contained in the answer at most k times for some parameter k. Subject to mild technical assumptions on the representation systems, it turns out that expectations are easy to compute, even for unions of conjunctive queries. For query answer probabilities, we obtain a dichotomy between solvability in polynomial time and #P-hardness for self-join free conjunctive queries.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源