统一梯度以改善深网的现实世界鲁棒性

论文标题

统一梯度以改善深网的现实世界鲁棒性

Unifying Gradients to Improve Real-world Robustness for Deep Networks

论文作者

Wu, Yingwen, Chen, Sizhe, Fang, Kun, Huang, Xiaolin

论文摘要

深度神经网络（DNN）的广泛应用要求对其现实世界的鲁棒性的关注越来越多，即DNN是否抵抗Black-Box对抗性攻击，其中基于得分的查询攻击（SQAS）是最具威胁性的威胁，因为它们可以有效地伤害受害者网络，唯一可以访问模型输出的受害者网络。针对SQA的防御需要与SQA共享相同输出信息的用户的服务目的，因此需要轻微但巧妙的输出变化。在本文中，我们通过统一不同数据的梯度（UNIG）提出了现实世界的防御，因此SQA只能探测不同样本相似的较弱攻击方向。由于这种普遍攻击的扰动已被验证不如特定于投入的扰动，因此Unig通过指示攻击者一个扭曲且信息性较低的攻击方向来保护现实世界中的DNN。我们通过插件播放的Hadamard产品模块有效地实现了Unig。根据对5个SQA，2种自适应攻击和7种防御基线的广泛实验，Unig显着提高了现实世界的鲁棒性，而不会伤害CIFAR10和Imagenet上的清洁准确性。例如，Unig在2500 Query Square攻击下保持了77.80％精度的模型，而最先进的对手训练的模型仅在CIFAR10上具有67.34％的速度。同时，Unig的表现都优于所有基准，从而在清洁准确性方面进行了比较，并且可以实现模型输出的最小修改。该代码在https://github.com/snowien/unig-pytorch上发布。

The wide application of deep neural networks (DNNs) demands an increasing amount of attention to their real-world robustness, i.e., whether a DNN resists black-box adversarial attacks, among which score-based query attacks (SQAs) are most threatening since they can effectively hurt a victim network with the only access to model outputs. Defending against SQAs requires a slight but artful variation of outputs due to the service purpose for users, who share the same output information with SQAs. In this paper, we propose a real-world defense by Unifying Gradients (UniG) of different data so that SQAs could only probe a much weaker attack direction that is similar for different samples. Since such universal attack perturbations have been validated as less aggressive than the input-specific perturbations, UniG protects real-world DNNs by indicating attackers a twisted and less informative attack direction. We implement UniG efficiently by a Hadamard product module which is plug-and-play. According to extensive experiments on 5 SQAs, 2 adaptive attacks and 7 defense baselines, UniG significantly improves real-world robustness without hurting clean accuracy on CIFAR10 and ImageNet. For instance, UniG maintains a model of 77.80% accuracy under 2500-query Square attack while the state-of-the-art adversarially-trained model only has 67.34% on CIFAR10. Simultaneously, UniG outperforms all compared baselines in terms of clean accuracy and achieves the smallest modification of the model output. The code is released at https://github.com/snowien/UniG-pytorch.

下载PDF全文

下载文献需遵守相关版权规定

论文标题