通过有条件的bert采样和欺骗文本分类器的应用程序重写有意义的句子

论文标题

通过有条件的bert采样和欺骗文本分类器的应用程序重写有意义的句子

Rewriting Meaningful Sentences via Conditional BERT Sampling and an application on fooling text classifiers

论文作者

Xu, Lei, Ramirez, Ivan, Veeramachaneni, Kalyan

论文摘要

旨在欺骗文本分类器的大多数对抗性攻击方法通过修改几个单词或字符来改变文本分类器的预测。由于句子级别的重新设计固有的困难以及设定合法重写标准的问题，很少有人尝试通过重写整个句子来攻击分类器。在本文中，我们探讨了用句子级重写创建对抗性示例的问题。我们设计了一种名为ParaphraseAmpler的新采样方法，以多种方式有效地重写原始句子。然后，我们提出了一个新的修改标准，称为句子级威胁模型。该标准允许单词和句子级别的变化，并且可以在两个维度上独立调整：语义相似性和语法质量。实验结果表明，这些重写的句子中有许多被分类器错误分类。在所有6个数据集中，我们的释义缩影都比我们的基线取得了更好的攻击成功率。

Most adversarial attack methods that are designed to deceive a text classifier change the text classifier's prediction by modifying a few words or characters. Few try to attack classifiers by rewriting a whole sentence, due to the difficulties inherent in sentence-level rephrasing as well as the problem of setting the criteria for legitimate rewriting. In this paper, we explore the problem of creating adversarial examples with sentence-level rewriting. We design a new sampling method, named ParaphraseSampler, to efficiently rewrite the original sentence in multiple ways. Then we propose a new criteria for modification, called a sentence-level threaten model. This criteria allows for both word- and sentence-level changes, and can be adjusted independently in two dimensions: semantic similarity and grammatical quality. Experimental results show that many of these rewritten sentences are misclassified by the classifier. On all 6 datasets, our ParaphraseSampler achieves a better attack success rate than our baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题