用于发展可解释模型的数据可视化的基因编程

论文标题

用于发展可解释模型的数据可视化的基因编程

Genetic Programming for Evolving a Front of Interpretable Models for Data Visualisation

论文作者

Lensen, Andrew, Xue, Bing, Zhang, Mengjie

论文摘要

数据可视化是用于了解大数据集的数据挖掘的关键工具。已经提出了许多可视化方法，包括备受推崇的最先进的方法T-分布的随机邻居嵌入。但是，最强大的可视化方法具有重要的限制：它们从数据集的原始特征创建可视化的方式完全不透明。许多域需要根据原始特征来理解数据。因此，需要使用可理解的模型的强大可视化方法。在这项工作中，我们提出了一种名为GPTSNE的基因编程方法，用于将可解释的映射从数据集变为高质量可视化。设计了一种多目标方法，可以在单个运行中产生各种可视化，从而在视觉质量和模型复杂性之间进行不同的权衡。针对各种数据集上的基线方法的测试表明，GP-TSNE的明确潜力比现有可视化方法更深入地了解数据。我们通过对候选阵线的深入分析进一步强调了多目标方法的好处，该方法显示了多个模型如何可以

Data visualisation is a key tool in data mining for understanding big datasets. Many visualisation methods have been proposed, including the well-regarded state-of-the-art method t-Distributed Stochastic Neighbour Embedding. However, the most powerful visualisation methods have a significant limitation: the manner in which they create their visualisation from the original features of the dataset is completely opaque. Many domains require an understanding of the data in terms of the original features; there is hence a need for powerful visualisation methods which use understandable models. In this work, we propose a genetic programming approach named GPtSNE for evolving interpretable mappings from a dataset to highquality visualisations. A multi-objective approach is designed that produces a variety of visualisations in a single run which give different trade-offs between visual quality and model complexity. Testing against baseline methods on a variety of datasets shows the clear potential of GP-tSNE to allow deeper insight into data than that provided by existing visualisation methods. We further highlight the benefits of a multi-objective approach through an in-depth analysis of a candidate front, which shows how multiple models can

下载PDF全文

下载文献需遵守相关版权规定

论文标题