论文标题
全局$ k $ - $ ++ $:全球$ k $ - 米恩斯聚类算法的有效放松
Global $k$-means$++$: an effective relaxation of the global $k$-means clustering algorithm
论文作者
论文摘要
$ k $ -Means算法是一种普遍的聚类方法,因为其简单,有效性和速度。但是,其主要缺点是它对聚类中心初始位置的高度敏感性。全球$ k $ -Means是一种确定性算法,旨在解决K-均值的随机初始化问题,但其众所周知的需要高计算成本。它通过以所有$ k = 1,\ ldots,k $来逐步求解所有$ k $ subproblems,将数据划分为$ k $簇。对于每个$ k $ cluster问题,该方法执行$ k $ -Means算法$ n $ times,其中$ n $是数据点的数量。在本文中,我们提出了\ emph {全局$ k $ -Means \ texttt {++}}聚类算法,这是一种有效的方法,可以获取类似于具有减少计算负载的全球$ k $ -MEANS的质量聚类解决方案。这是通过利用中心选择概率来实现的,该概率可在$ k $ -Means \ texttt {++}算法中有效使用。在各种基准数据集中已经测试和比较了所提出的方法,在聚类质量和执行速度方面产生了非常令人满意的结果。
The $k$-means algorithm is a prevalent clustering method due to its simplicity, effectiveness, and speed. However, its main disadvantage is its high sensitivity to the initial positions of the cluster centers. The global $k$-means is a deterministic algorithm proposed to tackle the random initialization problem of k-means but its well-known that requires high computational cost. It partitions the data to $K$ clusters by solving all $k$-means sub-problems incrementally for all $k=1,\ldots, K$. For each $k$ cluster problem, the method executes the $k$-means algorithm $N$ times, where $N$ is the number of datapoints. In this paper, we propose the \emph{global $k$-means\texttt{++}} clustering algorithm, which is an effective way of acquiring quality clustering solutions akin to those of global $k$-means with a reduced computational load. This is achieved by exploiting the center selection probability that is effectively used in the $k$-means\texttt{++} algorithm. The proposed method has been tested and compared in various benchmark datasets yielding very satisfactory results in terms of clustering quality and execution speed.