论文标题
通过一步攻击进行快速,强大的对抗训练
Towards Rapid and Robust Adversarial Training with One-Step Attacks
论文作者
论文摘要
对抗性训练是提高神经网络针对对抗性攻击的鲁棒性的最成功的经验方法。但是,最有效的方法,例如预测梯度下降(PGD)的培训都伴有高计算复杂性。在本文中,我们提出了两个想法,结合使用计算较低的快速梯度符号方法(FGSM)实现对抗性训练。首先,我们为FGSM攻击的初始数据点添加了统一的噪声,该数据点会产生更广泛的对手,从而禁止过度适合一种特定的扰动结合。此外,我们在神经网络之前添加了一个可学习的正则化步骤,我们称之为PixelWise噪声注入层(PNIL)。输入传播的槽中pnil是从学习的高斯分布中重新采样的。 PNIL引起的正则化阻止了模型形式学习使其梯度混淆,这一因素阻碍了先前方法成功地应用一步方法进行对抗训练。我们表明,与基于FGSM的对抗训练结合使用噪声可以与PGD的对抗训练相当,同时更快。此外,我们通过将噪声注入和PNIL结合起来优于基于PGD的对抗训练。
Adversarial training is the most successful empirical method for increasing the robustness of neural networks against adversarial attacks. However, the most effective approaches, like training with Projected Gradient Descent (PGD) are accompanied by high computational complexity. In this paper, we present two ideas that, in combination, enable adversarial training with the computationally less expensive Fast Gradient Sign Method (FGSM). First, we add uniform noise to the initial data point of the FGSM attack, which creates a wider variety of adversaries, thus prohibiting overfitting to one particular perturbation bound. Further, we add a learnable regularization step prior to the neural network, which we call Pixelwise Noise Injection Layer (PNIL). Inputs propagated trough the PNIL are resampled from a learned Gaussian distribution. The regularization induced by the PNIL prevents the model form learning to obfuscate its gradients, a factor that hindered prior approaches from successfully applying one-step methods for adversarial training. We show that noise injection in conjunction with FGSM-based adversarial training achieves comparable results to adversarial training with PGD while being considerably faster. Moreover, we outperform PGD-based adversarial training by combining noise injection and PNIL.