论文标题

爸爸:测试时间无数据的对抗防御

DAD: Data-free Adversarial Defense at Test Time

论文作者

Nayak, Gaurav Kumar, Rawal, Ruchit, Chakraborty, Anirban

论文摘要

深层模型非常容易受到对抗攻击的影响。这种攻击是精心制作的不可察觉的噪音,这些声音可能会欺骗网络,并在部署时会造成严重的后果。为了遇到它们,该模型需要培训数据,以进行对抗训练或明确的基于正则化的技术。但是,隐私已成为一个重要的问题,仅限于仅访问训练有素的模型,而不是培训数据(例如生物识别数据)。此外,数据策划很昂贵,公司可能拥有专有权。为了应对这种情况,我们提出了一个完全新颖的问题,即“在没有培训数据甚至其统计数据的情况下,测试时间对抗防御”。我们在两个阶段解决了它:a)检测和b)对抗样品的校正。我们的对抗样本检测框架最初是在任意数据上训练的,随后通过无监督的域适应来适应了未标记的测试数据。我们通过将它们在傅立叶域中转换并在我们提出的合适半径以进行模型预测的情况下转换其低频分量,以进一步纠正检测到的对抗样品的预测。我们通过对几种对抗性攻击以及不同模型架构和数据集的广泛实验来证明我们提出的技术的功效。对于在CIFAR-10上预先训练的非舒适RESNET-18模型,我们的检测方法正确识别了91.42%的对手。此外,我们将对抗性准确性从0%提高到37.37%,而最先进的“自动攻击”的清洁准确度最小为0.02%,而无需重新培训模型。

Deep models are highly susceptible to adversarial attacks. Such attacks are carefully crafted imperceptible noises that can fool the network and can cause severe consequences when deployed. To encounter them, the model requires training data for adversarial training or explicit regularization-based techniques. However, privacy has become an important concern, restricting access to only trained models but not the training data (e.g. biometric data). Also, data curation is expensive and companies may have proprietary rights over it. To handle such situations, we propose a completely novel problem of 'test-time adversarial defense in absence of training data and even their statistics'. We solve it in two stages: a) detection and b) correction of adversarial samples. Our adversarial sample detection framework is initially trained on arbitrary data and is subsequently adapted to the unlabelled test data through unsupervised domain adaptation. We further correct the predictions on detected adversarial samples by transforming them in Fourier domain and obtaining their low frequency component at our proposed suitable radius for model prediction. We demonstrate the efficacy of our proposed technique via extensive experiments against several adversarial attacks and for different model architectures and datasets. For a non-robust Resnet-18 model pre-trained on CIFAR-10, our detection method correctly identifies 91.42% adversaries. Also, we significantly improve the adversarial accuracy from 0% to 37.37% with a minimal drop of 0.02% in clean accuracy on state-of-the-art 'Auto Attack' without having to retrain the model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源