论文标题
一种轻巧的算法,可在数据表中发现深厚的关系
A Lightweight Algorithm to Uncover Deep Relationships in Data Tables
论文作者
论文摘要
我们今天收集的许多数据都是表格形式,将行作为记录和列作为与每个记录关联的属性。了解表格数据中的结构关系可以极大地促进数据科学过程。传统上,这些关系信息的大部分存储在表格架上,并由其创建者(通常是领域专家)维护。在本文中,我们开发了自动化方法,以发现没有专家或领域知识的单个数据表中的深厚关系。我们的方法可以将数据表分解为较小表的层,从而揭示其深层结构。我们方法的关键是我们开发的一种计算轻巧的正向添加算法,该算法是为了递归地在表列之间递归提取功能依赖性,这些表列可扩展到具有许多列的表。借助我们的解决方案,在探索新数据集时,将为数据科学家提供自动生成的,数据驱动的见解。
Many data we collect today are in tabular form, with rows as records and columns as attributes associated with each record. Understanding the structural relationship in tabular data can greatly facilitate the data science process. Traditionally, much of this relational information is stored in table schema and maintained by its creators, usually domain experts. In this paper, we develop automated methods to uncover deep relationships in a single data table without expert or domain knowledge. Our method can decompose a data table into layers of smaller tables, revealing its deep structure. The key to our approach is a computationally lightweight forward addition algorithm that we developed to recursively extract the functional dependencies between table columns that are scalable to tables with many columns. With our solution, data scientists will be provided with automatically generated, data-driven insights when exploring new data sets.