论文标题
对分类的反事实解释的声明性方法
Declarative Approaches to Counterfactual Explanations for Classification
论文作者
论文摘要
我们建议对分类模型输入的实体的反事实干预指定和计算反事实干预的答案集程序。与模型的结果有关,由此产生的反事实实体是对分类中实体中特征值的定义和计算的定义和计算的基础,即“责任分数”。该方法和程序可以使用Black-Box模型应用,还可以使用可以指定为逻辑程序的模型,例如基于规则的分类器。这项工作的主要重点是“最佳”反事实实体的规范和计算,即导致最高责任分数的实体。从他们那里可以将解释视为原始实体中最大责任特征值。我们还扩展了程序以将其带入图片语义或域知识。我们展示了如何通过概率方法扩展该方法,以及如何通过使用约束来修改潜在的概率分布。显示了以DLV ASP-Solver的语法编写的几个程序的示例,并显示了它。
We propose answer-set programs that specify and compute counterfactual interventions on entities that are input on a classification model. In relation to the outcome of the model, the resulting counterfactual entities serve as a basis for the definition and computation of causality-based explanation scores for the feature values in the entity under classification, namely "responsibility scores". The approach and the programs can be applied with black-box models, and also with models that can be specified as logic programs, such as rule-based classifiers. The main focus of this work is on the specification and computation of "best" counterfactual entities, i.e. those that lead to maximum responsibility scores. From them one can read off the explanations as maximum responsibility feature values in the original entity. We also extend the programs to bring into the picture semantic or domain knowledge. We show how the approach could be extended by means of probabilistic methods, and how the underlying probability distributions could be modified through the use of constraints. Several examples of programs written in the syntax of the DLV ASP-solver, and run with it, are shown.