全局$ k $ - $ ++ $：全球$ k $ - 米恩斯聚类算法的有效放松

论文标题

全局$ k $ - $ ++ $：全球$ k $ - 米恩斯聚类算法的有效放松

Global $k$-means$++$: an effective relaxation of the global $k$-means clustering algorithm

论文作者

Vardakas, Georgios, Likas, Aristidis

论文摘要

$ k $ -Means算法是一种普遍的聚类方法，因为其简单，有效性和速度。但是，其主要缺点是它对聚类中心初始位置的高度敏感性。全球$ k $ -Means是一种确定性算法，旨在解决K-均值的随机初始化问题，但其众所周知的需要高计算成本。它通过以所有$ k = 1，\ ldots，k $来逐步求解所有$ k $ subproblems，将数据划分为$ k $簇。对于每个$ k $ cluster问题，该方法执行$ k $ -Means算法$ n $ times，其中$ n $是数据点的数量。在本文中，我们提出了\ emph {全局$ k $ -Means \ texttt {++}}聚类算法，这是一种有效的方法，可以获取类似于具有减少计算负载的全球$ k $ -MEANS的质量聚类解决方案。这是通过利用中心选择概率来实现的，该概率可在$ k $ -Means \ texttt {++}算法中有效使用。在各种基准数据集中已经测试和比较了所提出的方法，在聚类质量和执行速度方面产生了非常令人满意的结果。

The $k$-means algorithm is a prevalent clustering method due to its simplicity, effectiveness, and speed. However, its main disadvantage is its high sensitivity to the initial positions of the cluster centers. The global $k$-means is a deterministic algorithm proposed to tackle the random initialization problem of k-means but its well-known that requires high computational cost. It partitions the data to $K$ clusters by solving all $k$-means sub-problems incrementally for all $k=1,\ldots, K$. For each $k$ cluster problem, the method executes the $k$-means algorithm $N$ times, where $N$ is the number of datapoints. In this paper, we propose the \emph{global $k$-means\texttt{++}} clustering algorithm, which is an effective way of acquiring quality clustering solutions akin to those of global $k$-means with a reduced computational load. This is achieved by exploiting the center selection probability that is effectively used in the $k$-means\texttt{++} algorithm. The proposed method has been tested and compared in various benchmark datasets yielding very satisfactory results in terms of clustering quality and execution speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题