伯特攻击：使用伯特对伯特的对抗性攻击

论文标题

伯特攻击：使用伯特对伯特的对抗性攻击

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

论文作者

Li, Linyang, Ma, Ruotian, Guo, Qipeng, Xue, Xiangyang, Qiu, Xipeng

论文摘要

事实证明，对离散数据（例如文本）的对抗性攻击比连续数据（例如图像）更具挑战性，因为很难用基于梯度的方法生成对抗性样本。当前的文本成功攻击方法通常在角色或单词级别上采用启发式替代策略，这仍然具有挑战性，在可能替换的可能组合中找到最佳的解决方案，同时保持语义一致性和语言流利性。在本文中，我们提出了\ textbf {bert-attack}，这是一种使用伯特（Bert）例证的预训练的蒙版语言模型来生成对抗样本的高质量有效方法。我们将BERT与下游任务中的微调模型和其他深层神经模型相反，因此我们可以成功地误导目标模型以进行错误的预测。我们的方法在成功率和扰动百分比方面优于最先进的攻击策略，而生成的对抗样本则流利并且可以保留语义。同样，计算成本很低，因此对于大型世代而言可能。该代码可在https://github.com/linyanglee/bert-attack上找到。

Adversarial attacks for discrete data (such as texts) have been proved significantly more challenging than continuous data (such as images) since it is difficult to generate adversarial samples with gradient-based methods. Current successful attack methods for texts usually adopt heuristic replacement strategies on the character or word level, which remains challenging to find the optimal solution in the massive space of possible combinations of replacements while preserving semantic consistency and language fluency. In this paper, we propose \textbf{BERT-Attack}, a high-quality and effective method to generate adversarial samples using pre-trained masked language models exemplified by BERT. We turn BERT against its fine-tuned models and other deep neural models in downstream tasks so that we can successfully mislead the target models to predict incorrectly. Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage, while the generated adversarial samples are fluent and semantically preserved. Also, the cost of calculation is low, thus possible for large-scale generations. The code is available at https://github.com/LinyangLee/BERT-Attack.

下载PDF全文

下载文献需遵守相关版权规定

论文标题