对基于梯度的对抗性攻击的最小值防御

论文标题

对基于梯度的对抗性攻击的最小值防御

Minimax Defense against Gradient-based Adversarial Attacks

论文作者

Lindqvist, Blerta, Izmailov, Rauf

论文摘要

最先进的对抗攻击针对神经网络分类器。默认情况下，神经网络使用梯度下降来最大程度地减少其损失功能。基于梯度的对抗攻击使用分类器损耗函数的梯度来生成对抗性扰动的图像。我们提出了一个问题，是否另一种优化可以使神经网络分类器具有优势。在这里，我们介绍了一种新颖的方法，该方法将最小值优化用于基于箔梯度的对抗攻击。我们的Minimax分类器是使用GAN GENERATOR扮演Minimax游戏的生成对抗网络（GAN）的歧视器。此外，我们的GAN发电机将所有点投射到与原始歧管不同的歧管上，因为原始歧管可能是对抗攻击的原因。为了衡量最小值防御的性能，我们使用对抗性攻击-Carlini Wagner（CW），DeepFool，快速梯度标志方法（FGSM） - 在三个数据集上：MNIST，CIFAR -10和德国交通标志（交通）。在CW攻击方面，我们的最小防御能力达到98.07％（MNIST-DEFAULT 98.93％），73.90％（CIFAR-10-DEFAULT 83.14％）和94.54％（交通默认为96.97％）。在DeepFool攻击方面，我们的最小防御能力达到98.87％（MNIST），76.61％（CIFAR-10）和94.57％（交通）。针对FGSM攻击，我们取得了97.01％（MNIST），76.79％（CIFAR-10）和81.41％（流量）。我们的最小对抗方法给神经网络分类器的国防战略带来了重大变化。

State-of-the-art adversarial attacks are aimed at neural network classifiers. By default, neural networks use gradient descent to minimize their loss function. The gradient of a classifier's loss function is used by gradient-based adversarial attacks to generate adversarially perturbed images. We pose the question whether another type of optimization could give neural network classifiers an edge. Here, we introduce a novel approach that uses minimax optimization to foil gradient-based adversarial attacks. Our minimax classifier is the discriminator of a generative adversarial network (GAN) that plays a minimax game with the GAN generator. In addition, our GAN generator projects all points onto a manifold that is different from the original manifold since the original manifold might be the cause of adversarial attacks. To measure the performance of our minimax defense, we use adversarial attacks - Carlini Wagner (CW), DeepFool, Fast Gradient Sign Method (FGSM) - on three datasets: MNIST, CIFAR-10 and German Traffic Sign (TRAFFIC). Against CW attacks, our minimax defense achieves 98.07% (MNIST-default 98.93%), 73.90% (CIFAR-10-default 83.14%) and 94.54% (TRAFFIC-default 96.97%). Against DeepFool attacks, our minimax defense achieves 98.87% (MNIST), 76.61% (CIFAR-10) and 94.57% (TRAFFIC). Against FGSM attacks, we achieve 97.01% (MNIST), 76.79% (CIFAR-10) and 81.41% (TRAFFIC). Our Minimax adversarial approach presents a significant shift in defense strategy for neural network classifiers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题