面部识别领域中对卷积神经网络的对抗性攻击

论文标题

面部识别领域中对卷积神经网络的对抗性攻击

Adversarial Attacks on Convolutional Neural Networks in Facial Recognition Domain

论文作者

Alparslan, Yigit, Alparslan, Ken, Keim-Shenk, Jeremy, Khade, Shweta, Greenstadt, Rachel

论文摘要

最近的许多研究表明，对抗性示例如何使深度神经网络（DNN）分类器愚弄，其中攻击者在原始样本中增加了扰动，从而导致分类器错误地分类样本。使DNN在现实生活中易受伤害的对抗性攻击代表了自动驾驶汽车，恶意软件过滤器或生物识别身份验证系统的严重威胁。在本文中，我们应用快速梯度标志方法将扰动引入面部图像数据集，然后测试我们训练自己的不同分类器上的输出，以分析此方法的可传递性。接下来，我们在面部图像数据集上制作了各种不同的黑盒攻击算法，假设对抗性知识很少，以进一步评估DNN在面部识别方面的鲁棒性。在尝试不同的图像失真技术时，我们专注于大量修改单个最佳像素，或者用较小的量修改所有像素，或组合这两种攻击方法。尽管我们的单像素攻击实际上的分类器置信度降低了约15％，但全像素攻击更加成功，并且达到了84％的平均置信度下降，而失误率为81.6％，而我们对攻击进行了最高扰动水平的攻击。即使使用这些高水平的扰动，面部图像仍然可以识别为人类。了解这些有扰动和扰动的图像如何困惑分类算法可以在针对防御感知的对抗性攻击以及自适应降噪技术的DNN训练方面取得宝贵的进步。我们希望我们的研究能够帮助进步研究对DNN和防御机制的对抗性攻击，以抵消它们，尤其是在面部识别领域。

Numerous recent studies have demonstrated how Deep Neural Network (DNN) classifiers can be fooled by adversarial examples, in which an attacker adds perturbations to an original sample, causing the classifier to misclassify the sample. Adversarial attacks that render DNNs vulnerable in real life represent a serious threat in autonomous vehicles, malware filters, or biometric authentication systems. In this paper, we apply Fast Gradient Sign Method to introduce perturbations to a facial image dataset and then test the output on a different classifier that we trained ourselves, to analyze transferability of this method. Next, we craft a variety of different black-box attack algorithms on a facial image dataset assuming minimal adversarial knowledge, to further assess the robustness of DNNs in facial recognition. While experimenting with different image distortion techniques, we focus on modifying single optimal pixels by a large amount, or modifying all pixels by a smaller amount, or combining these two attack approaches. While our single-pixel attacks achieved about a 15% average decrease in classifier confidence level for the actual class, the all-pixel attacks were more successful and achieved up to an 84% average decrease in confidence, along with an 81.6% misclassification rate, in the case of the attack that we tested with the highest levels of perturbation. Even with these high levels of perturbation, the face images remained identifiable to a human. Understanding how these noised and perturbed images baffle the classification algorithms can yield valuable advances in the training of DNNs against defense-aware adversarial attacks, as well as adaptive noise reduction techniques. We hope our research may help to advance the study of adversarial attacks on DNNs and defensive mechanisms to counteract them, particularly in the facial recognition domain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题