论文标题
$ k $ -Means的几何方法
A Geometric Approach to $k$-means
论文作者
论文摘要
$ k $ - 均值聚类是各个学科的基本问题。这个问题是非convex,并且仅保证标准算法找到本地最佳。利用[1]中特征的局部解决方案的结构,我们提出了一个普遍的算法框架,用于逃避不良的局部解决方案并恢复全球解决方案(或地面真相)。该框架包括在以下两个步骤之间进行交替:(i)在局部解决方案中检测错误指定的簇,以及(ii)通过非本地操作改善当前的局部解决方案。我们讨论了这些步骤的实现,并阐明了拟议的框架如何从几何学角度统一文献中$ k $ - 均值算法的变体。此外,我们介绍了所提出的框架的两个自然扩展,其中最初的簇数被指定。我们为我们的方法提供理论上的理由,这是通过广泛的实验证实的。
$k$-means clustering is a fundamental problem in various disciplines. This problem is nonconvex, and standard algorithms are only guaranteed to find a local optimum. Leveraging the structure of local solutions characterized in [1], we propose a general algorithmic framework for escaping undesirable local solutions and recovering the global solution (or the ground truth). This framework consists of alternating between the following two steps iteratively: (i) detect mis-specified clusters in a local solution and (ii) improve the current local solution by non-local operations. We discuss implementation of these steps, and elucidate how the proposed framework unifies variants of $k$-means algorithm in literature from a geometric perspective. In addition, we introduce two natural extensions of the proposed framework, where the initial number of clusters is misspecified. We provide theoretical justification for our approach, which is corroborated with extensive experiments.