论文标题
神经卡:所有表的一个基数估计器
NeuroCard: One Cardinality Estimator for All Tables
论文作者
论文摘要
查询优化器依靠准确的基数估算来制定良好的执行计划。尽管进行了数十年的研究,但由于做出有损的建模假设而没有捕获桌间相关性,现有的基数估计值不准确。在这项工作中,我们表明可以在没有任何独立假设的情况下学习数据库中所有表的相关性。我们提出了神经卡,这是一个联接基数估计器,该估计量在整个数据库上构建单个神经密度估计器。利用联接采样和现代深度自回归模型,Neurocard在其概率建模中没有任何餐桌间或柱间独立性假设。 NeuroCard的准确度比最佳先前方法(8.5 $ \ times $ the Jobight in Jobight的最大错误)的准确度高得多的阶数,比例达到了数十张表,同时在太空中(几mBS)紧凑,并且有效地构建或更新(秒为单位到分钟)。
Query optimizers rely on accurate cardinality estimates to produce good execution plans. Despite decades of research, existing cardinality estimators are inaccurate for complex queries, due to making lossy modeling assumptions and not capturing inter-table correlations. In this work, we show that it is possible to learn the correlations across all tables in a database without any independence assumptions. We present NeuroCard, a join cardinality estimator that builds a single neural density estimator over an entire database. Leveraging join sampling and modern deep autoregressive models, NeuroCard makes no inter-table or inter-column independence assumptions in its probabilistic modeling. NeuroCard achieves orders of magnitude higher accuracy than the best prior methods (a new state-of-the-art result of 8.5$\times$ maximum error on JOB-light), scales to dozens of tables, while being compact in space (several MBs) and efficient to construct or update (seconds to minutes).