Jacobian规范具有选择性输入梯度正则化，以改进和可解释的对抗防御

论文标题

Jacobian规范具有选择性输入梯度正则化，以改进和可解释的对抗防御

Jacobian Norm with Selective Input Gradient Regularization for Improved and Interpretable Adversarial Defense

论文作者

Liu, Deyin, Wu, Lin, Zhao, Haifeng, Boussaid, Farid, Bennamoun, Mohammed, Xie, Xianghua

论文摘要

已知深度神经网络（DNN）容易受到用不可察觉的扰动制作的对抗性例子的影响，即，输入图像的微小变化会引起错误的分类，从而威胁着基于深度学习的部署系统的可靠性。通常采用对抗性训练（AT）来通过训练损坏和干净的数据的混合来提高鲁棒性。但是，大多数基于AT的方法在处理转移的对抗示例方面无效，这些示例是为了欺骗各种防御模型的生成，因此无法满足在现实情况下提出的概括要求。此外，对抗性训练一般来说，国防模型不能对具有扰动的输入产生可解释的预测，而不同的领域专家则需要高度可解释的强大模型来了解DNN的行为。在这项工作中，我们提出了一种基于Jacobian规范和选择性输入梯度正则化（J-SIGR）的新方法，该方法通过Jacobian归一化提出了线性化的鲁棒性，还规定了基于扰动的显着性图，以模仿模型的可解释的预测。因此，我们既可以提高DNN的防御能力和高解释性。最后，我们评估了针对强大的对抗攻击的不同体系结构的方法。实验表明，提出的J-Sigr赋予了针对转移的对抗攻击的鲁棒性，我们还表明，来自神经网络的预测很容易解释。

Deep neural networks (DNNs) are known to be vulnerable to adversarial examples that are crafted with imperceptible perturbations, i.e., a small change in an input image can induce a mis-classification, and thus threatens the reliability of deep learning based deployment systems. Adversarial training (AT) is often adopted to improve robustness through training a mixture of corrupted and clean data. However, most of AT based methods are ineffective in dealing with transferred adversarial examples which are generated to fool a wide spectrum of defense models, and thus cannot satisfy the generalization requirement raised in real-world scenarios. Moreover, adversarially training a defense model in general cannot produce interpretable predictions towards the inputs with perturbations, whilst a highly interpretable robust model is required by different domain experts to understand the behaviour of a DNN. In this work, we propose a novel approach based on Jacobian norm and Selective Input Gradient Regularization (J-SIGR), which suggests the linearized robustness through Jacobian normalization and also regularizes the perturbation-based saliency maps to imitate the model's interpretable predictions. As such, we achieve both the improved defense and high interpretability of DNNs. Finally, we evaluate our method across different architectures against powerful adversarial attacks. Experiments demonstrate that the proposed J-SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.

下载PDF全文

下载文献需遵守相关版权规定

论文标题