论文标题

通过鲁棒性分析的评估和方法来解释

Evaluations and Methods for Explanation through Robustness Analysis

论文作者

Hsieh, Cheng-Yu, Yeh, Chih-Kuan, Liu, Xuanqing, Ravikumar, Pradeep, Kim, Seungyeon, Kumar, Sanjiv, Hsieh, Cho-Jui

论文摘要

基于功能的解释可以说明每个功能对模型预测的重要性,可以说是解释模型的最直观的方法之一。在本文中,我们通过鲁棒性分析建立了一组新的评估标准,以基于功能的解释。与现有的评估相比,我们要求我们指定某种方法来“删除”可能不可避免地引入偏见和人工制品的功能,我们利用了较小的对抗性扰动的微妙概念。通过对我们提出的评估标准进行优化,我们获得了新的解释,这些解释是必要的,足以进行预测。我们进一步扩展了解释,以提取一组功能,这些功能将通过采用目标对抗攻击来进行鲁棒性分析,从而将当前预测转移到目标类别。通过跨多个领域和用户研究的实验,我们验证了评估标准和衍生解释的实用性。

Feature based explanations, that provide importance of each feature towards the model prediction, is arguably one of the most intuitive ways to explain a model. In this paper, we establish a novel set of evaluation criteria for such feature based explanations by robustness analysis. In contrast to existing evaluations which require us to specify some way to "remove" features that could inevitably introduces biases and artifacts, we make use of the subtler notion of smaller adversarial perturbations. By optimizing towards our proposed evaluation criteria, we obtain new explanations that are loosely necessary and sufficient for a prediction. We further extend the explanation to extract the set of features that would move the current prediction to a target class by adopting targeted adversarial attack for the robustness analysis. Through experiments across multiple domains and a user study, we validate the usefulness of our evaluation criteria and our derived explanations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源