基于非语义特征群集替换的不受限制的对抗样本

论文标题

基于非语义特征群集替换的不受限制的对抗样本

Unrestricted Adversarial Samples Based on Non-semantic Feature Clusters Substitution

论文作者

Zhou, MingWei, Pei, Xiaobing

论文摘要

大多数当前方法都会使用$ L_P $规范规范生成对抗示例。结果，许多防御方法利用此属性来消除此类攻击算法的影响。在本文中，我们推出了“不受限制的”扰动，通过使用模型培训学到的虚假关系来创建对抗性样本。具体而言，我们发现非语义特征的特征簇与模型判断结果密切相关，并将其视为模型学到的虚假关系。然后，我们通过使用它们来替换目标图像中相应的特征簇来创建对抗样本。实验评估表明，在黑盒和白色盒子情况下。我们的对抗性示例不会改变图像的语义，同时仍然有效地欺骗了经过对抗训练的DNN图像分类器。

Most current methods generate adversarial examples with the $L_p$ norm specification. As a result, many defense methods utilize this property to eliminate the impact of such attacking algorithms. In this paper,we instead introduce "unrestricted" perturbations that create adversarial samples by using spurious relations which were learned by model training. Specifically, we find feature clusters in non-semantic features that are strongly correlated with model judgment results, and treat them as spurious relations learned by the model. Then we create adversarial samples by using them to replace the corresponding feature clusters in the target image. Experimental evaluations show that in both black-box and white-box situations. Our adversarial examples do not change the semantics of images, while still being effective at fooling an adversarially trained DNN image classifier.

下载PDF全文

下载文献需遵守相关版权规定

论文标题