论文标题
优化关系数据库中的光标循环
Optimizing Cursor Loops in Relational Databases
论文作者
论文摘要
在DBM之外运行的应用程序,以及用户定义的功能(UDFS)和在DBMS内运行的存储过程,遍历SQL查询结果的循环非常普遍。可以说,面向设定的操作更有效,应该优先于迭代。但是,从现实世界中的用例中,很明显,在许多情况下,对查询结果的循环是不可避免的,并且受到许多用户的青睐。这种称为光标循环的循环具有巨大的权衡和间接费用W.R.T.绩效,资源消耗和并发。 我们提出了agghify,这是一种优化循环而不是查询结果,以克服所有这些开销。它通过自动生成语义上等效的自定义聚集体来实现这一目标。因此,通过重写查询以使用此生成的聚集体来完全消除循环。该技术具有多个优点,例如:(i)在整个光标循环操作的管道上,而不是物质化,(ii)将循环计算从应用层向下计算到DBMS,更接近数据,(iii)利用现有的用于优化聚合功能的工作,从而导致有效的Query计划。我们描述了基于易合的技术,并介绍了我们对基准测试的实验评估以及实际工作量,这些技术证明了该技术的重大好处。
Loops that iterate over SQL query results are quite common, both in application programs that run outside the DBMS, as well as User Defined Functions (UDFs) and stored procedures that run within the DBMS. It can be argued that set-oriented operations are more efficient and should be preferred over iteration; but from real-world use cases, it is clear that loops over query results are inevitable in many situations, and are preferred by many users. Such loops, known as cursor loops, come with huge trade-offs and overheads w.r.t. performance, resource consumption and concurrency. We present Aggify, a technique for optimizing loops over query results that overcomes all these overheads. It achieves this by automatically generating custom aggregates that are equivalent in semantics to the loop. Thereby, Aggify completely eliminates the loop by rewriting the query to use this generated aggregate. This technique has several advantages such as: (i) pipelining of entire cursor loop operations instead of materialization, (ii) pushing down loop computation from the application layer into the DBMS, closer to the data, (iii) leveraging existing work on optimization of aggregate functions, resulting in efficient query plans. We describe the technique underlying Aggify and present our experimental evaluation over benchmarks as well as real workloads that demonstrate the significant benefits of this technique.