论文标题
通过公平小组表示公平聚类
Fair clustering via equitable group representations
论文作者
论文摘要
聚类公平意味着什么?一种流行的方法旨在确保每个群集都包含(大致)在人口中存在的比例。在发挥作用的规范原则是平衡:任何集群都可以充当数据的代表,因此应反映其多样性。 但是聚类也捕获了不同形式的代表性。在大多数聚类问题中,核心原理是群集中心应该通过“接近”与之相关的点来代表其代表的群集。这样一来,我们就可以通过其集群中心有效地替换了这些点,而不会在忠诚度上大幅损失,并且确实是聚类的常见“用例”。为了公平的聚类,这些中心应该同样地“代表”不同的群体。我们称之为群集代表聚类。 在本文中,我们研究了组代表聚类的结构和计算。我们表明,这一观念自然与分类中的公平概念的发展与人口平等和机会均等的观念的直接类似。我们证明了这些观念与基于平衡的公平概念如何与众不同,不能捕获。我们为组代表$ K $ -Median聚类提供了近似算法,并将其与各种现实世界数据集进行经验评估。
What does it mean for a clustering to be fair? One popular approach seeks to ensure that each cluster contains groups in (roughly) the same proportion in which they exist in the population. The normative principle at play is balance: any cluster might act as a representative of the data, and thus should reflect its diversity. But clustering also captures a different form of representativeness. A core principle in most clustering problems is that a cluster center should be representative of the cluster it represents, by being "close" to the points associated with it. This is so that we can effectively replace the points by their cluster centers without significant loss in fidelity, and indeed is a common "use case" for clustering. For such a clustering to be fair, the centers should "represent" different groups equally well. We call such a clustering a group-representative clustering. In this paper, we study the structure and computation of group-representative clusterings. We show that this notion naturally parallels the development of fairness notions in classification, with direct analogs of ideas like demographic parity and equal opportunity. We demonstrate how these notions are distinct from and cannot be captured by balance-based notions of fairness. We present approximation algorithms for group representative $k$-median clustering and couple this with an empirical evaluation on various real-world data sets.