您使我自动完成：神经代码完成中的脆弱性

论文标题

您使我自动完成：神经代码完成中的脆弱性

You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion

论文作者

Schuster, Roei, Song, Congzheng, Tromer, Eran, Shmatikov, Vitaly

论文摘要

代码自动完成是现代代码编辑器和IDE的组成部分。最新一代的自动组件使用者使用了对公共开源代码存储库进行培训的神经语言模型，以建议在当前情况下（不仅是静态上可行的）完成。我们证明了神经代码自动完成者容易受到中毒攻击的影响。通过在AutoCompleter的培训语料库（数据中毒）中添加一些专门制作的文件，或者通过直接对这些文件（模型中毒）进行微调，攻击者可以影响其对攻击者选择的环境的建议。例如，攻击者可以“教” AutoCompleter建议使用AES加密的不安全欧洲央行模式，用于SSL/TLS协议版本的SSLV3或用于基于密码的加密的低迭代计数。此外，我们表明这些攻击可以针对：被目标攻击中毒的自动填料更有可能表明从特定回购或特定开发人员的文件完成不安全。我们根据Pythia和GPT-2量化了针对最先进的自动完成者的目标和非目标数据和模型的攻击的功效。然后，我们评估了现有的防御中毒攻击，并表明它们在很大程度上无效。

Code autocompletion is an integral feature of modern code editors and IDEs. The latest generation of autocompleters uses neural language models, trained on public open-source code repositories, to suggest likely (not just statically feasible) completions given the current context. We demonstrate that neural code autocompleters are vulnerable to poisoning attacks. By adding a few specially-crafted files to the autocompleter's training corpus (data poisoning), or else by directly fine-tuning the autocompleter on these files (model poisoning), the attacker can influence its suggestions for attacker-chosen contexts. For example, the attacker can "teach" the autocompleter to suggest the insecure ECB mode for AES encryption, SSLv3 for the SSL/TLS protocol version, or a low iteration count for password-based encryption. Moreover, we show that these attacks can be targeted: an autocompleter poisoned by a targeted attack is much more likely to suggest the insecure completion for files from a specific repo or specific developer. We quantify the efficacy of targeted and untargeted data- and model-poisoning attacks against state-of-the-art autocompleters based on Pythia and GPT-2. We then evaluate existing defenses against poisoning attacks and show that they are largely ineffective.

下载PDF全文

下载文献需遵守相关版权规定

论文标题