对有效的多域语言模型预训练的实证研究

论文标题

对有效的多域语言模型预训练的实证研究

An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training

论文作者

Arumae, Kristjan, Sun, Qing, Bhatia, Parminder

论文摘要

预培训大语模型已成为自然语言处理社区的标准。此类模型已在通用数据（例如BookCorpus和English Wikipedia）上进行了预训练，并且经常在同一域中的任务进行微调。但是，为了在诸如临床命名实体识别和关系提取之类的域名任务上实现最先进的绩效，需要在域中进行额外的培训。在实践中，当在诸如胶水等通用基准（例如胶水）上评估时，分期的多域预训练以灾难性遗忘（CF）的形式呈现出性能恶化。在本文中，我们对缓解CF的已知方法进行了实证研究。我们发现，弹性重量合并提供了最佳的总体得分，在七个通用任务中的性能下降仅为0.33％，同时在生物医学任务中保持竞争力。此外，我们探索基于梯度和潜在聚类的数据选择技术，以改善使用弹性重量合并和经验重播方法时的覆盖范围。

Pre-training large language models has become a standard in the natural language processing community. Such models are pre-trained on generic data (e.g. BookCorpus and English Wikipedia) and often fine-tuned on tasks in the same domain. However, in order to achieve state-of-the-art performance on out of domain tasks such as clinical named entity recognition and relation extraction, additional in domain pre-training is required. In practice, staged multi-domain pre-training presents performance deterioration in the form of catastrophic forgetting (CF) when evaluated on a generic benchmark such as GLUE. In this paper we conduct an empirical investigation into known methods to mitigate CF. We find that elastic weight consolidation provides best overall scores yielding only a 0.33% drop in performance across seven generic tasks while remaining competitive in bio-medical tasks. Furthermore, we explore gradient and latent clustering based data selection techniques to improve coverage when using elastic weight consolidation and experience replay methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题