论文标题
NLP微调方法中的记忆
Memorization in NLP Fine-tuning Methods
论文作者
论文摘要
显示大型语言模型可以通过记忆培训数据出现隐私风险,而最近的一些著作研究了训练阶段的这种风险。然而,很少关注微调阶段,并且还不太了解不同的微调方法(例如对完整模型,模型头和适配器进行微调)如何在记忆风险方面进行比较。随着“预训练和微调”范式增殖,这引起了人们的关注。在本文中,我们使用成员推理和提取攻击从经验上研究了微调方法的记忆,并证明它们对攻击的敏感性截然不同。我们观察到,微调模型的头部对攻击的敏感性最高,而微调较小的适配器似乎不太容易受到已知提取攻击的影响。
Large language models are shown to present privacy risks through memorization of training data, and several recent works have studied such risks for the pre-training phase. Little attention, however, has been given to the fine-tuning phase and it is not well understood how different fine-tuning methods (such as fine-tuning the full model, the model head, and adapter) compare in terms of memorization risk. This presents increasing concern as the "pre-train and fine-tune" paradigm proliferates. In this paper, we empirically study memorization of fine-tuning methods using membership inference and extraction attacks, and show that their susceptibility to attacks is very different. We observe that fine-tuning the head of the model has the highest susceptibility to attacks, whereas fine-tuning smaller adapters appears to be less vulnerable to known extraction attacks.