与证据反事实解释的图像分类

论文标题

与证据反事实解释的图像分类

Explainable Image Classification with Evidence Counterfactual

论文作者

Vermeire, Tom, Martens, David

论文摘要

图像分类的最新建模技术的复杂性阻碍了以可解释的方式解释模型预测的能力。现有的解释方法通常在像素或像素组方面创建重要性排名。但是，由此产生的解释缺乏最佳尺寸，不考虑特征依赖性，并且仅与一个类有关。反事实解释方法被认为有望解释复杂的模型决策，因为它们与高度的人类解释性有关。在本文中，SEDC作为一种模型无关实例级解释方法，用于图像分类，以获得视觉反事实说明。对于给定的图像，SEDC搜索了一小部分段，如果删除，这些段会改变分类。由于图像分类任务通常是多类问题，因此将SEDC-T作为允许指定目标反事实类的替代方法。我们将SEDC（-T）与流行的特征重要性方法（例如LRP，Lime和Shap）进行比较，并描述了如何解决上述重要性排名问题。此外，具体的例子和实验说明了我们方法（1）获得信任和见解的潜力，以及（2）通过解释错误分类来获得模型改进的输入。

The complexity of state-of-the-art modeling techniques for image classification impedes the ability to explain model predictions in an interpretable way. Existing explanation methods generally create importance rankings in terms of pixels or pixel groups. However, the resulting explanations lack an optimal size, do not consider feature dependence and are only related to one class. Counterfactual explanation methods are considered promising to explain complex model decisions, since they are associated with a high degree of human interpretability. In this paper, SEDC is introduced as a model-agnostic instance-level explanation method for image classification to obtain visual counterfactual explanations. For a given image, SEDC searches a small set of segments that, in case of removal, alters the classification. As image classification tasks are typically multiclass problems, SEDC-T is proposed as an alternative method that allows specifying a target counterfactual class. We compare SEDC(-T) with popular feature importance methods such as LRP, LIME and SHAP, and we describe how the mentioned importance ranking issues are addressed. Moreover, concrete examples and experiments illustrate the potential of our approach (1) to obtain trust and insight, and (2) to obtain input for model improvement by explaining misclassifications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题