论文标题
Fusion子空间聚类的不完整数据
Fusion Subspace Clustering for Incomplete Data
论文作者
论文摘要
本文介绍了{\ em Fusion子空间聚类},这是一种学习低维结构的新方法,该方法近似于大规模但高度不完整的数据。主要思想是将每个基准分配给自己的子空间,并最大程度地减少所有数据子空间之间的距离,以便将同一群集的子空间get {\ em fused}一起添加。我们的方法允许低,高甚至全等级数据。它直接解释了噪声,其样本复杂性接近信息理论极限。此外,我们的方法还提供了自然模型选择{\ em clusterPath}和一种直接完成方法。我们提供收敛的保证,分析计算复杂性,并通过对真实和合成数据进行的广泛实验来显示,我们的方法与最先进的数据具有完整的数据相当,并且如果丢失数据,则可以更好地表现出更好的作用。
This paper introduces {\em fusion subspace clustering}, a novel method to learn low-dimensional structures that approximate large scale yet highly incomplete data. The main idea is to assign each datum to a subspace of its own, and minimize the distance between the subspaces of all data, so that subspaces of the same cluster get {\em fused} together. Our method allows low, high, and even full-rank data; it directly accounts for noise, and its sample complexity approaches the information-theoretic limit. In addition, our approach provides a natural model selection {\em clusterpath}, and a direct completion method. We give convergence guarantees, analyze computational complexity, and show through extensive experiments on real and synthetic data that our approach performs comparably to the state-of-the-art with complete data, and dramatically better if data is missing.