“您的解释稳定吗？”：功能归因的稳健评估框架

论文标题

“您的解释稳定吗？”：功能归因的稳健评估框架

"Is your explanation stable?": A Robustness Evaluation Framework for Feature Attribution

论文作者

Gan, Yuyou, Mao, Yuhao, Zhang, Xuhong, Ji, Shouling, Pu, Yuwen, Han, Meng, Yin, Jianwei, Wang, Ting

论文摘要

了解神经网络的决策过程很难。解释的一种至关重要的方法是将其决定归因于关键特征。尽管提出了许多算法，但其中大多数仅改善了模型的忠诚。但是，真实的环境包含许多随机噪音，这可能会导致解释中的波动。更严重的是，最近的作品表明，解释算法容易受到对抗性攻击的影响。所有这些使解释很难在实际情况下信任。为了弥合这一差距，我们提出了一种模型 - 不合Snostic方法\ emph {特征归因}（METFA）的中位数测试，以量化不确定性并提高使用理论保证的解释算法的稳定性。 METFA具有以下两个函数：（1）检查一个特征是显着重要还是不重要，并生成METFA很重要的映射以可视化结果；（2）计算特征归因评分的置信区间，并生成一个平滑的映射以提高解释的稳定性。实验表明，METFA提高了解释的视觉质量，并在保持忠诚的同时大大减少了不稳定。为了定量评估不同噪声设置下的解释的忠诚，我们进一步提出了几个健壮的忠诚指标。实验结果表明，METFA平滑的解释可以显着增加稳健的忠诚。此外，我们使用两种方案来显示METFA在应用程序中的潜力。首先，当应用于SOTA解释方法来定位语义分割模型的上下文偏见时，METFA很重要的解释使用较小的区域来维持99 \％+忠诚。其次，当通过不同的以解释为导向的攻击进行测试时，METFA可以帮助捍卫香草，以及对解释的自适应，对抗性攻击。

Understanding the decision process of neural networks is hard. One vital method for explanation is to attribute its decision to pivotal features. Although many algorithms are proposed, most of them solely improve the faithfulness to the model. However, the real environment contains many random noises, which may leads to great fluctuations in the explanations. More seriously, recent works show that explanation algorithms are vulnerable to adversarial attacks. All of these make the explanation hard to trust in real scenarios. To bridge this gap, we propose a model-agnostic method \emph{Median Test for Feature Attribution} (MeTFA) to quantify the uncertainty and increase the stability of explanation algorithms with theoretical guarantees. MeTFA has the following two functions: (1) examine whether one feature is significantly important or unimportant and generate a MeTFA-significant map to visualize the results; (2) compute the confidence interval of a feature attribution score and generate a MeTFA-smoothed map to increase the stability of the explanation. Experiments show that MeTFA improves the visual quality of explanations and significantly reduces the instability while maintaining the faithfulness. To quantitatively evaluate the faithfulness of an explanation under different noise settings, we further propose several robust faithfulness metrics. Experiment results show that the MeTFA-smoothed explanation can significantly increase the robust faithfulness. In addition, we use two scenarios to show MeTFA's potential in the applications. First, when applied to the SOTA explanation method to locate context bias for semantic segmentation models, MeTFA-significant explanations use far smaller regions to maintain 99\%+ faithfulness. Second, when tested with different explanation-oriented attacks, MeTFA can help defend vanilla, as well as adaptive, adversarial attacks against explanations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题