通过因果干预对自然语言攻击的认证鲁棒性

论文标题

通过因果干预对自然语言攻击的认证鲁棒性

Certified Robustness Against Natural Language Attacks by Causal Intervention

论文作者

Zhao, Haiteng, Ma, Chang, Dong, Xinshuai, Luu, Anh Tuan, Deng, Zhi-Hong, Zhang, Hanwang

论文摘要

深度学习模型在许多领域取得了巨大的成功，但它们容易受到对抗性例子的影响。本文遵循因果观点来研究对抗性脆弱性，并提出通过语义平滑（CISS）的因果干预，这是一种针对自然语言攻击的鲁棒性的新框架。 CISS不仅可以拟合观察数据，还通过在潜在的语义空间中平滑而学习因果效应P（y | do（x）），以做出强大的预测，这些预测会缩放到深度体系结构，并避免针对特定攻击定制噪声的繁琐构造。事实证明，即使未知的攻击算法加强了扰动，CISS也对单词替代攻击以及经验上的强大也非常强大。例如，在Yelp上，CISS在对单词替换的认证鲁棒性方面超过了6.7％，当句法攻击集成时，CISS的鲁棒性对单词替换性的稳定性超过了79.4％的经验鲁棒性。

Deep learning models have achieved great success in many fields, yet they are vulnerable to adversarial examples. This paper follows a causal perspective to look into the adversarial vulnerability and proposes Causal Intervention by Semantic Smoothing (CISS), a novel framework towards robustness against natural language attacks. Instead of merely fitting observational data, CISS learns causal effects p(y|do(x)) by smoothing in the latent semantic space to make robust predictions, which scales to deep architectures and avoids tedious construction of noise customized for specific attacks. CISS is provably robust against word substitution attacks, as well as empirically robust even when perturbations are strengthened by unknown attack algorithms. For example, on YELP, CISS surpasses the runner-up by 6.7% in terms of certified robustness against word substitutions, and achieves 79.4% empirical robustness when syntactic attacks are integrated.

下载PDF全文

下载文献需遵守相关版权规定

论文标题