通过未训练的替代物未查询的对抗转移

论文标题

通过未训练的替代物未查询的对抗转移

Query-Free Adversarial Transfer via Undertrained Surrogates

论文作者

Miller, Chris, Vosoughi, Soroush

论文摘要

深度神经网络很容易受到对抗性示例的影响 - 模型输入中添加了较小的扰动，这会导致模型输出不正确的预测。我们介绍了一种新方法，用于通过低估攻击生成的替代模型来提高在黑框设置中的对抗性攻击的功效。使用两个数据集和五个模型体系结构，我们表明此方法跨体系结构很好地传递了跨架构的最先进方法。我们将方法的有效性解释为降低替代模型损失函数曲率和增加通用梯度特征的函数，并表明我们的方法降低了局部损失最大值的存在，从而阻碍了可传递性。我们的结果表明，找到强大的单个替代模型是一种产生可转移对抗性攻击的高效和简单方法，并且该方法代表了该领域未来研究的宝贵途径。

Deep neural networks are vulnerable to adversarial examples -- minor perturbations added to a model's input which cause the model to output an incorrect prediction. We introduce a new method for improving the efficacy of adversarial attacks in a black-box setting by undertraining the surrogate model which the attacks are generated on. Using two datasets and five model architectures, we show that this method transfers well across architectures and outperforms state-of-the-art methods by a wide margin. We interpret the effectiveness of our approach as a function of reduced surrogate model loss function curvature and increased universal gradient characteristics, and show that our approach reduces the presence of local loss maxima which hinder transferability. Our results suggest that finding strong single surrogate models is a highly effective and simple method for generating transferable adversarial attacks, and that this method represents a valuable route for future study in this field.

下载PDF全文

下载文献需遵守相关版权规定

论文标题