Fusion子空间聚类的不完整数据

论文标题

Fusion子空间聚类的不完整数据

Fusion Subspace Clustering for Incomplete Data

论文作者

Mahmood, Usman, Pimentel-Alarcón, Daniel

论文摘要

本文介绍了{\ em Fusion子空间聚类}，这是一种学习低维结构的新方法，该方法近似于大规模但高度不完整的数据。主要思想是将每个基准分配给自己的子空间，并最大程度地减少所有数据子空间之间的距离，以便将同一群集的子空间get {\ em fused}一起添加。我们的方法允许低，高甚至全等级数据。它直接解释了噪声，其样本复杂性接近信息理论极限。此外，我们的方法还提供了自然模型选择{\ em clusterPath}和一种直接完成方法。我们提供收敛的保证，分析计算复杂性，并通过对真实和合成数据进行的广泛实验来显示，我们的方法与最先进的数据具有完整的数据相当，并且如果丢失数据，则可以更好地表现出更好的作用。

This paper introduces {\em fusion subspace clustering}, a novel method to learn low-dimensional structures that approximate large scale yet highly incomplete data. The main idea is to assign each datum to a subspace of its own, and minimize the distance between the subspaces of all data, so that subspaces of the same cluster get {\em fused} together. Our method allows low, high, and even full-rank data; it directly accounts for noise, and its sample complexity approaches the information-theoretic limit. In addition, our approach provides a natural model selection {\em clusterpath}, and a direct completion method. We give convergence guarantees, analyze computational complexity, and show through extensive experiments on real and synthetic data that our approach performs comparably to the state-of-the-art with complete data, and dramatically better if data is missing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题