具有生成模型的差异反事实

论文标题

具有生成模型的差异反事实

Diffeomorphic Counterfactuals with Generative Models

论文作者

Dombrowski, Ann-Kathrin, Gerken, Jan E., Müller, Klaus-Robert, Kessel, Pan

论文摘要

反事实可以以人类解释的方式解释神经网络的分类决策。我们提出了一种简单但有效的方法来产生这种反事实。更具体地说，我们执行合适的差异坐标转换，然后在这些坐标中执行梯度上升，以查找反事实，这些反事实被置信为特定的目标类别。我们提出了两种方法来利用生成模型来构建完全或大约差异的合适坐标系。我们使用Riemannian差异几何形状从理论上分析了生成过程，并使用各种定性和定量测量方法验证了生成的反事实的质量。

Counterfactuals can explain classification decisions of neural networks in a human interpretable way. We propose a simple but effective method to generate such counterfactuals. More specifically, we perform a suitable diffeomorphic coordinate transformation and then perform gradient ascent in these coordinates to find counterfactuals which are classified with great confidence as a specified target class. We propose two methods to leverage generative models to construct such suitable coordinate systems that are either exactly or approximately diffeomorphic. We analyze the generation process theoretically using Riemannian differential geometry and validate the quality of the generated counterfactuals using various qualitative and quantitative measures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题