论文标题

基于集群合并的公用事业有效差异私有K-均值聚类

Utility-efficient Differentially Private K-means Clustering based on Cluster Merging

论文作者

Ni, Tianjiao, Qiao, Minghao, Chen, Zhili, Zhang, Shun, Zhong, Hong

论文摘要

差异隐私被广泛用于数据分析。具有差异隐私的最先进的$ k $ -MEANS聚类算法通常会为每个迭代计算的质心增加相等数量的噪声。在本文中,我们提出了一种新颖的私有$ k $ -means聚类算法DP-KCCM,可通过添加自适应噪声和合并簇来大大改善聚类的实用性。具体来说,要获得具有差异隐私的$ K $簇,该算法首先生成$ n \ times k $初始质心,为每次迭代添加了自适应噪声以获得$ n \ times k $ clusters,最后将这些簇合并为$ k $。从理论上讲,我们证明了拟议算法的差异隐私。出乎意料的是,广泛的实验结果表明:1)群集合并与等量的噪声可以在某种程度上改善效用; 2)尽管增加自适应噪声并不能改善效用,但是将群集合并和自适应噪声结合起来进一步改善了效用。

Differential privacy is widely used in data analysis. State-of-the-art $k$-means clustering algorithms with differential privacy typically add an equal amount of noise to centroids for each iterative computation. In this paper, we propose a novel differentially private $k$-means clustering algorithm, DP-KCCM, that significantly improves the utility of clustering by adding adaptive noise and merging clusters. Specifically, to obtain $k$ clusters with differential privacy, the algorithm first generates $n \times k$ initial centroids, adds adaptive noise for each iteration to get $n \times k$ clusters, and finally merges these clusters into $k$ ones. We theoretically prove the differential privacy of the proposed algorithm. Surprisingly, extensive experimental results show that: 1) cluster merging with equal amounts of noise improves the utility somewhat; 2) although adding adaptive noise only does not improve the utility, combining both cluster merging and adaptive noise further improves the utility significantly.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源