EXKMC：扩展可解释的$ k $ -Means群集

论文标题

EXKMC：扩展可解释的$ k $ -Means群集

ExKMC: Expanding Explainable $k$-Means Clustering

论文作者

Frost, Nave, Moshkovitz, Michal, Rashtchian, Cyrus

论文摘要

尽管可以解释的AI流行，但在无监督学习的有效方法上的工作有限。我们研究了$ k $ - 均值聚类的算法，重点是解释性和准确性之间的权衡。事先工作后，我们使用一个小型决策树将数据集划分为$ K $群集。这使我们能够通过短序列单点阈值来解释每个集群分配。尽管较大的树木产生更准确的聚类，但它们还需要更复杂的解释。为了允许灵活性，我们开发了一种新的可解释的$ k $ -MEANS聚类算法，ExKMC，该算法采用了附加的参数$ k'\ geq k $，并输出了带有$ k'$叶子的决策树。我们使用新的替代成本有效地扩展了树，并用$ k $ clusters之一标记叶子。我们证明，随着$ k'$的增加，代孕成本是非侵扰的，因此，我们以准确性的解释性进行了交易。从经验上讲，我们验证EXKMC产生低成本聚类，表现优于标准决策树方法和其他用于解释聚类的算法。在https://github.com/navefr/exkmc上实现EXKMC。

Despite the popularity of explainable AI, there is limited work on effective methods for unsupervised learning. We study algorithms for $k$-means clustering, focusing on a trade-off between explainability and accuracy. Following prior work, we use a small decision tree to partition a dataset into $k$ clusters. This enables us to explain each cluster assignment by a short sequence of single-feature thresholds. While larger trees produce more accurate clusterings, they also require more complex explanations. To allow flexibility, we develop a new explainable $k$-means clustering algorithm, ExKMC, that takes an additional parameter $k' \geq k$ and outputs a decision tree with $k'$ leaves. We use a new surrogate cost to efficiently expand the tree and to label the leaves with one of $k$ clusters. We prove that as $k'$ increases, the surrogate cost is non-increasing, and hence, we trade explainability for accuracy. Empirically, we validate that ExKMC produces a low cost clustering, outperforming both standard decision tree methods and other algorithms for explainable clustering. Implementation of ExKMC available at https://github.com/navefr/ExKMC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题