NLP微调方法中的记忆

论文标题

NLP微调方法中的记忆

Memorization in NLP Fine-tuning Methods

论文作者

Mireshghallah, Fatemehsadat, Uniyal, Archit, Wang, Tianhao, Evans, David, Berg-Kirkpatrick, Taylor

论文摘要

显示大型语言模型可以通过记忆培训数据出现隐私风险，而最近的一些著作研究了训练阶段的这种风险。然而，很少关注微调阶段，并且还不太了解不同的微调方法（例如对完整模型，模型头和适配器进行微调）如何在记忆风险方面进行比较。随着“预训练和微调”范式增殖，这引起了人们的关注。在本文中，我们使用成员推理和提取攻击从经验上研究了微调方法的记忆，并证明它们对攻击的敏感性截然不同。我们观察到，微调模型的头部对攻击的敏感性最高，而微调较小的适配器似乎不太容易受到已知提取攻击的影响。

Large language models are shown to present privacy risks through memorization of training data, and several recent works have studied such risks for the pre-training phase. Little attention, however, has been given to the fine-tuning phase and it is not well understood how different fine-tuning methods (such as fine-tuning the full model, the model head, and adapter) compare in terms of memorization risk. This presents increasing concern as the "pre-train and fine-tune" paradigm proliferates. In this paper, we empirically study memorization of fine-tuning methods using membership inference and extraction attacks, and show that their susceptibility to attacks is very different. We observe that fine-tuning the head of the model has the highest susceptibility to attacks, whereas fine-tuning smaller adapters appears to be less vulnerable to known extraction attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题