论文标题
要更好地理解归因方法
Towards Better Understanding Attribution Methods
论文作者
论文摘要
深度神经网络在许多视觉任务上非常成功,但由于其黑匣子性质而难以解释。为了克服这一点,已经提出了各种事后归因方法来识别模型决策最大的图像区域。由于不存在基础真理归因,因此评估这种方法是具有挑战性的。因此,我们提出了三种新颖的评估方案,以更可靠地衡量这些方法的忠诚,以使它们之间的比较更加公平,并使视觉检查更加系统。为了解决忠诚,我们提出了一个新颖的评估环境(DIFULL),在该设置中,我们仔细控制输入的哪些部分可以影响输出,以区分可能的不可能归因。为了解决公平性,我们注意到,在不同的层上应用了不同的方法,这会偏向所有比较,因此在同一层(ML-ATT)上评估所有方法,并讨论这如何影响其在定量指标上的性能。对于更系统的可视化,我们提出了一个方案(AGGATT),以定性地评估完整数据集中的方法。我们使用这些评估方案来研究一些广泛使用的归因方法的优势和缺点。最后,我们提出了一个后处理的平滑步骤,可显着提高某些归因方法的性能,并讨论其适用性。
Deep neural networks are very successful on many vision tasks, but hard to interpret due to their black box nature. To overcome this, various post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. Evaluating such methods is challenging since no ground truth attributions exist. We thus propose three novel evaluation schemes to more reliably measure the faithfulness of those methods, to make comparisons between them more fair, and to make visual inspection more systematic. To address faithfulness, we propose a novel evaluation setting (DiFull) in which we carefully control which parts of the input can influence the output in order to distinguish possible from impossible attributions. To address fairness, we note that different methods are applied at different layers, which skews any comparison, and so evaluate all methods on the same layers (ML-Att) and discuss how this impacts their performance on quantitative metrics. For more systematic visualizations, we propose a scheme (AggAtt) to qualitatively evaluate the methods on complete datasets. We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods. Finally, we propose a post-processing smoothing step that significantly improves the performance of some attribution methods, and discuss its applicability.