论文标题
有益的扰动网络用于捍卫对抗性示例
Beneficial Perturbations Network for Defending Adversarial Examples
论文作者
论文摘要
深层神经网络可能会被对抗性攻击所欺骗:添加精心计算的小型对抗扰动清洁输入可能会导致对最新机器学习模型的错误分类。原因是神经网络无法适应由对抗性扰动引起的输入数据的分布漂移。在这里,我们提出了一个新的解决方案 - 有益的扰动网络(BPN) - 通过修复分配漂移来防御对抗攻击。在培训期间,BPN通过添加新的,网络外的偏置单元来产生和利用有益的扰动(与众所周知的对抗扰动相反)。偏置单元影响网络的参数空间,以抢占和中和未来的对抗性扰动对输入数据样本。为了实现这一目标,BPN通过回收已经计算的训练梯度,在训练过程中产生反向对手攻击,几乎没有成本。反向攻击是由偏见单位捕获的,偏见反过来又可以有效地抵御未来的对抗例子。反向攻击是一个快捷方式,即它们会影响网络的参数,而无需实例化可以帮助培训的对抗示例。我们提供了全面的经验证据,表明1)BPN对对抗性例子是可靠的,并且与经典的对抗训练相比,跑步记忆和计算效率更高。 2)与仅在干净的示例中培训相比,BPN可以防御具有可忽略的额外计算和参数成本的对抗性例子; 3)BPN在干净的例子上损害了与经典的对抗训练要少得多的准确性; 4)BPN可以改善网络的概括5)仅经过快速梯度标志攻击而受过训练的BPN可以推广以捍卫PGD攻击。
Deep neural networks can be fooled by adversarial attacks: adding carefully computed small adversarial perturbations to clean inputs can cause misclassification on state-of-the-art machine learning models. The reason is that neural networks fail to accommodate the distribution drift of the input data caused by adversarial perturbations. Here, we present a new solution - Beneficial Perturbation Network (BPN) - to defend against adversarial attacks by fixing the distribution drift. During training, BPN generates and leverages beneficial perturbations (somewhat opposite to well-known adversarial perturbations) by adding new, out-of-network biasing units. Biasing units influence the parameter space of the network, to preempt and neutralize future adversarial perturbations on input data samples. To achieve this, BPN creates reverse adversarial attacks during training, with very little cost, by recycling the training gradients already computed. Reverse attacks are captured by the biasing units, and the biases can in turn effectively defend against future adversarial examples. Reverse attacks are a shortcut, i.e., they affect the network's parameters without requiring instantiation of adversarial examples that could assist training. We provide comprehensive empirical evidence showing that 1) BPN is robust to adversarial examples and is much more running memory and computationally efficient compared to classical adversarial training. 2) BPN can defend against adversarial examples with negligible additional computation and parameter costs compared to training only on clean examples; 3) BPN hurts the accuracy on clean examples much less than classic adversarial training; 4) BPN can improve the generalization of the network 5) BPN trained only with Fast Gradient Sign Attack can generalize to defend PGD attacks.