论文标题
基于天然最近邻域的密度峰值聚类的提高概率传播算法
An Improved Probability Propagation Algorithm for Density Peak Clustering Based on Natural Nearest Neighborhood
论文作者
论文摘要
通过快速搜索并发现密度峰(DPC)(自2014年以来)的聚类已被证明是一种有希望的聚类方法,可以通过找到密度峰来有效地发现簇中心。 DPC的准确性取决于截止距离($ d_c $),群集号($ K $)和簇中心的选择。此外,最终的分配策略是敏感的,容错差。上面的缺点使该算法对参数敏感,仅适用于某些特定数据集。为了克服DPC的局限性,本文提出了基于天然最近邻域(DPC-PPPNNN)的密度峰值聚类的概率传播算法的提高。通过引入自然最近的邻域和概率传播的想法,DPC-PPNNN实现了非参数聚类过程,并使该算法适用于更复杂的数据集。在几个数据集的实验中,DPC-PPNNN显示出胜过DPC,K-均值和DBSCAN。
Clustering by fast search and find of density peaks (DPC) (Since, 2014) has been proven to be a promising clustering approach that efficiently discovers the centers of clusters by finding the density peaks. The accuracy of DPC depends on the cutoff distance ($d_c$), the cluster number ($k$) and the selection of the centers of clusters. Moreover, the final allocation strategy is sensitive and has poor fault tolerance. The shortcomings above make the algorithm sensitive to parameters and only applicable for some specific datasets. To overcome the limitations of DPC, this paper presents an improved probability propagation algorithm for density peak clustering based on the natural nearest neighborhood (DPC-PPNNN). By introducing the idea of natural nearest neighborhood and probability propagation, DPC-PPNNN realizes the nonparametric clustering process and makes the algorithm applicable for more complex datasets. In experiments on several datasets, DPC-PPNNN is shown to outperform DPC, K-means and DBSCAN.