论文标题

基于NN的任何SQL基数估算器的转换,用于处理不同

NN-based Transformation of Any SQL Cardinality Estimator for Handling DISTINCT, AND, OR and NOT

论文作者

Hayek, Rojeh, Shmueli, Oded

论文摘要

SQL查询,与OR,或者不是操作员,构成了一系列高度使用的查询。因此,它们的基数估计对于查询优化很重要。此外,查询策划者需要对具有不同计划和计划的查询的设定理论基础性(即无重复);例如,在考虑排序选项时。然而,尽管在存在明显的存在下估算查询基础性的重要性,或者,而且不是许多基数估计方法仅限于估计仅具有重复项计数的连接性查询的基础性。 这项工作的重点是处理这种缺陷的两种方法,可以应用于任何有限的基数估计模型。首先,我们描述了一种专门的深度学习方案PUNQ,该方案是为代表连接的SQL查询而量身定制的,并通过重复的行预测查询结果中唯一行的百分比。使用通过PUNQ获得的预测百分比,我们能够将仅估计连接性查询的任何基数估计方法转化为估计重复(例如MSCN)的基本问题,以估计不复制品的查询红外线。这使用独特关键字的查询估算基础。此外,我们描述了一种递归算法GENCRD,用于扩展任何只能处理连接性查询的基数估计方法m,该方法估计了对于更通用的查询(包括和或不包含)的基础性,而无需更改方法M本身。 我们的评估是在一个具有挑战性的现实世界数据库上进行的,其中包含不同关键字或或不包括操作员的一般查询。在实验上,我们表明所提出的方法以与原始转化方法相同的准确性获得准确的基数估计。

SQL queries, with the AND, OR, and NOT operators, constitute a broad class of highly used queries. Thus, their cardinality estimation is important for query optimization. In addition, a query planner requires the set-theoretic cardinality (i.e., without duplicates) for queries with DISTINCT as well as in planning; for example, when considering sorting options. Yet, despite the importance of estimating query cardinalities in the presence of DISTINCT, AND, OR, and NOT, many cardinality estimation methods are limited to estimating cardinalities of only conjunctive queries with duplicates counted. The focus of this work is on two methods for handling this deficiency that can be applied to any limited cardinality estimation model. First, we describe a specialized deep learning scheme, PUNQ, which is tailored to representing conjunctive SQL queries and predicting the percentage of unique rows in the query's result with duplicate rows. Using the predicted percentages obtained via PUNQ, we are able to transform any cardinality estimation method that only estimates for conjunctive queries, and which estimates cardinalities with duplicates (e.g., MSCN), to a method that estimates queries cardinalities without duplicates. This enables estimating cardinalities of queries with the DISTINCT keyword. In addition, we describe a recursive algorithm, GenCrd, for extending any cardinality estimation method M that only handles conjunctive queries to one that estimates cardinalities for more general queries (that include AND, OR, and NOT), without changing the method M itself. Our evaluation is carried out on a challenging, real-world database with general queries that include either the DISTINCT keyword or the AND, OR, and NOT operators. Experimentally, we show that the proposed methods obtain accurate cardinality estimates with the same level of accuracy as that of the original transformed methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源