RES：指导视觉解释的强大框架

论文标题

RES：指导视觉解释的强大框架

RES: A Robust Framework for Guiding Visual Explanation

论文作者

Gao, Yuyang, Sun, Tong Steven, Bai, Guangji, Gu, Siyi, Hong, Sungsoo Ray, Zhao, Liang

论文摘要

尽管在现代深层神经网络（DNN）中的解释技术取得了快速的进步，其中主要重点是处理“如何产生解释”，但要检查解释本身质量的高级研究问题（例如，解释是准确的）并在解释质量中是否准确地解释（例如，“如何相关）”（例如，“如何相关）”（例如，“如何相关）”是“如何相关的”（例如，“如何相关）”。探索不足。为了指导模型朝着更好的解释，解释监督中的技术（在模型解释中增加了监督信号）已经开始对提高深度神经网络的概括性解释性和内在解释性表现出令人鼓舞的影响。然而，由于几个固有的挑战：1）人类解释注释边界的不准确性，2）人类解释注释区域的不完整，而3）人类的数据分布不完整，这是由于人类解释注释边界的不完整而在人类解释注释边界上的不准确，因此对监督解释的研究，尤其是在通过显着图代表的基于视觉的应用中的研究，正处于早期阶段。为了应对这些挑战，我们提出了一个通用的RES框架，用于通过开发一个新的目标，该框架通过处理不准确的边界，不完整的区域和人类注释分布不一致的新目标，并具有对模型推广性的理论理由。在两个现实世界图像数据集上进行的广泛实验证明了该框架在增强解释的合理性和骨干DNNS模型的性能方面的有效性。

Despite the fast progress of explanation techniques in modern Deep Neural Networks (DNNs) where the main focus is handling "how to generate the explanations", advanced research questions that examine the quality of the explanation itself (e.g., "whether the explanations are accurate") and improve the explanation quality (e.g., "how to adjust the model to generate more accurate explanations when explanations are inaccurate") are still relatively under-explored. To guide the model toward better explanations, techniques in explanation supervision - which add supervision signals on the model explanation - have started to show promising effects on improving both the generalizability as and intrinsic interpretability of Deep Neural Networks. However, the research on supervising explanations, especially in vision-based applications represented through saliency maps, is in its early stage due to several inherent challenges: 1) inaccuracy of the human explanation annotation boundary, 2) incompleteness of the human explanation annotation region, and 3) inconsistency of the data distribution between human annotation and model explanation maps. To address the challenges, we propose a generic RES framework for guiding visual explanation by developing a novel objective that handles inaccurate boundary, incomplete region, and inconsistent distribution of human annotations, with a theoretical justification on model generalizability. Extensive experiments on two real-world image datasets demonstrate the effectiveness of the proposed framework on enhancing both the reasonability of the explanation and the performance of the backbone DNNs model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题