论文标题

通过放松清洁否认限制违规行为

Cleaning Denial Constraint Violations through Relaxation

论文作者

Giannakopoulou, Stella, Karpathiotakis, Manos, Ailamaki, Anastasia

论文摘要

数据清洁是一个耗时的过程,取决于用户执行的数据分析。现有解决方案将数据清洁视为在分析开始之前发生的单独的离线过程。在分析之前应用数据清洁,对不一致和查询工作量有先验知识,从而需要努力理解和清洁不需要分析的数据。我们提出了一种方法,该方法是在用户执行的探索性分析的驱动下对否认约束违规行为进行否认约束的。我们介绍了Daisy,该系统通过放松查询结果将数据清洁无缝整合到分析中。黛西通过将清洁操作员编织到查询计划中,通过肮脏的数据执行分析查询负载。我们的评估表明,雏菊适应了工作量,并在合成和现实世界的工作量上胜过传统的离线清洁。

Data cleaning is a time-consuming process that depends on the data analysis that users perform. Existing solutions treat data cleaning as a separate offline process that takes place before analysis begins. Applying data cleaning before analysis assumes a priori knowledge of the inconsistencies and the query workload, thereby requiring effort on understanding and cleaning the data that is unnecessary for the analysis. We propose an approach that performs probabilistic repair of denial constraint violations on-demand, driven by the exploratory analysis that users perform. We introduce Daisy, a system that seamlessly integrates data cleaning into the analysis by relaxing query results. Daisy executes analytical query-workloads over dirty data by weaving cleaning operators into the query plan. Our evaluation shows that Daisy adapts to the workload and outperforms traditional offline cleaning on both synthetic and real-world workloads.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源