论文标题

使用单步对抗训练来捍卫迭代的对抗例子

Using Single-Step Adversarial Training to Defend Iterative Adversarial Examples

论文作者

Liu, Guanxiong, Khalil, Issa, Khreishah, Abdallah

论文摘要

对抗性例子已成为机器学习模型,尤其是神经网络分类器面临的最大挑战之一。这些对抗性示例打破了对无攻击的场景和愚蠢的最先进的(SOTA)分类器的假设,对人类的扰动微不足道。到目前为止,研究人员在利用对抗性训练作为防御方面取得了巨大进展。但是,压倒性的计算成本降低了其适用性,并且几乎没有采取任何措施来克服这个问题。单步对抗训练方法已被提议作为计算可行的解决方案,但是它们仍然无法抵抗迭代的对抗例子。在这项工作中,我们首先实验分析了针对对抗性例子的几种不同的SOTA防御方法。然后,根据实验的观察结果,我们提出了一种新型的单步对抗训练方法,可以防御单步和迭代的对抗例子。最后,通过广泛的评估,我们证明我们提出的方法的表现优于SOTA单步和迭代性的对抗训练防御。与CIFAR10数据集上的ATDA(单步法)相比,我们提出的方法在测试准确性方面可增强35.67%,训练时间减少19.14%。与在CIFAR10数据集上使用BIM或MADRY示例(迭代方法)的方法相比,它在训练时间内节省了高达76.03%的训练时间,测试准确性低于3.78%。

Adversarial examples have become one of the largest challenges that machine learning models, especially neural network classifiers, face. These adversarial examples break the assumption of attack-free scenario and fool state-of-the-art (SOTA) classifiers with insignificant perturbations to human. So far, researchers achieved great progress in utilizing adversarial training as a defense. However, the overwhelming computational cost degrades its applicability and little has been done to overcome this issue. Single-Step adversarial training methods have been proposed as computationally viable solutions, however they still fail to defend against iterative adversarial examples. In this work, we first experimentally analyze several different SOTA defense methods against adversarial examples. Then, based on observations from experiments, we propose a novel single-step adversarial training method which can defend against both single-step and iterative adversarial examples. Lastly, through extensive evaluations, we demonstrate that our proposed method outperforms the SOTA single-step and iterative adversarial training defense. Compared with ATDA (single-step method) on CIFAR10 dataset, our proposed method achieves 35.67% enhancement in test accuracy and 19.14% reduction in training time. When compared with methods that use BIM or Madry examples (iterative methods) on CIFAR10 dataset, it saves up to 76.03% in training time with less than 3.78% degeneration in test accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源