个人偏好稳定性用于聚类

论文标题

个人偏好稳定性用于聚类

Individual Preference Stability for Clustering

论文作者

Ahmadi, Saba, Awasthi, Pranjal, Khuller, Samir, Kleindessner, Matthäus, Morgenstern, Jamie, Sukprasert, Pattara, Vakilian, Ali

论文摘要

在本文中，我们提出了一个自然的单个偏好（IP）稳定性的概念，该概念要求每个数据点平均更接近其自身集群中的点，而不是其他群集中的点。我们的概念可以从几个角度的动机，包括游戏理论和算法公平。我们研究了与我们提出的概念有关的几个问题。我们首先表明，确定给定数据集通常允许进行IP稳定的聚类通常是NP-HARD。结果，我们探索了在某些受限的度量空间中查找IP稳定聚类的有效算法的设计。我们提出了一种poly Time算法，以在实际线路上找到满足精确IP稳定性的群集，并有效地算法来找到针对树公制的IP稳定2聚类。我们还考虑放松稳定性约束，即，与任何其他群集相比，每个数据点都不应距离其自身集群太远。对于这种情况，我们提供具有不同保证的多时间算法。我们在实际数据集上评估了一些算法和几种标准聚类方法。

In this paper, we propose a natural notion of individual preference (IP) stability for clustering, which asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster. Our notion can be motivated from several perspectives, including game theory and algorithmic fairness. We study several questions related to our proposed notion. We first show that deciding whether a given data set allows for an IP-stable clustering in general is NP-hard. As a result, we explore the design of efficient algorithms for finding IP-stable clusterings in some restricted metric spaces. We present a polytime algorithm to find a clustering satisfying exact IP-stability on the real line, and an efficient algorithm to find an IP-stable 2-clustering for a tree metric. We also consider relaxing the stability constraint, i.e., every data point should not be too far from its own cluster compared to any other cluster. For this case, we provide polytime algorithms with different guarantees. We evaluate some of our algorithms and several standard clustering approaches on real data sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题