论文标题
多视图重新描述挖掘的方法
Approaches For Multi-View Redescription Mining
论文作者
论文摘要
重新描述挖掘的任务探讨了重新描述数据集中包含的实体的不同子集的方法,并揭示不同属性子集之间的非平凡关联,称为视图。在不同的科学领域遇到了这项有趣而具有挑战性的任务,并通过多种获得重新描述并允许探索和分析属性关联的方法来解决。现有方法的主要局限性是他们无法使用两个以上的视图。我们的工作减轻了这一缺点。我们提出了一个内存有效的,可扩展的多视图重新描述挖掘框架,可用于关联多个,即两个以上视图,分开的属性集,描述一组实体。该框架可以使用任何可以用作规则集的模型来使用任何多目标回归或多标签分类算法来生成重新记录。多视图重新描述是使用最初创建的两视频重新标题的增量视图扩展启发式构建的。在这项工作中,我们使用不同类型的预测聚类树算法(常规,额外,随机输出选择)及其随机森林,以提高生成它们所需的最终重新描述集和/或执行时间的质量。我们提供了对拟议框架的多次绩效分析,并将其与多视图重新挖掘的天真方法进行比较。我们证明了在几个数据集上提出的多视图扩展程序的有用性,包括关于理解机器学习模型的用例,这在机器学习和人工智能中越来越重要。
The task of redescription mining explores ways to re-describe different subsets of entities contained in a dataset and to reveal non-trivial associations between different subsets of attributes, called views. This interesting and challenging task is encountered in different scientific fields, and is addressed by a number of approaches that obtain redescriptions and allow for the exploration and analyses of attribute associations. The main limitation of existing approaches to this task is their inability to use more than two views. Our work alleviates this drawback. We present a memory efficient, extensible multi-view redescription mining framework that can be used to relate multiple, i.e. more than two views, disjoint sets of attributes describing one set of entities. The framework can use any multi-target regression or multi-label classification algorithm, with models that can be represented as sets of rules, to generate redescriptions. Multi-view redescriptions are built using incremental view-extending heuristic from initially created two-view redescriptions. In this work, we use different types of Predictive Clustering trees algorithms (regular, extra, with random output selection) and the Random Forest thereof in order to improve the quality of final redescription sets and/or execution time needed to generate them. We provide multiple performance analyses of the proposed framework and compare it against the naive approach to multi-view redescription mining. We demonstrate the usefulness of the proposed multi-view extension on several datasets, including a use-case on understanding of machine learning models - a topic of growing importance in machine learning and artificial intelligence in general.