论文标题
要识别或不预识:研究资源丰富的任务的审议的好处
To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks
论文作者
论文摘要
具有蒙版语言模型(MLM)目标变体的NLP预测模型最近导致了许多任务的重大改进。本文探讨了预验证模型的好处,这是下游任务中使用的训练样本数量的函数。在几个文本分类任务上,我们表明,随着训练示例的数量增加数百万,基于BERT的模型与训练Vanilla LSTM之间的准确性差距从Scratch Narrows到1%以内。我们的发现表明,基于MLM的模型可能会随着监督数据大小的大幅度增加而达到的返回点。
Pretraining NLP models with variants of Masked Language Model (MLM) objectives has recently led to a significant improvements on many tasks. This paper examines the benefits of pretrained models as a function of the number of training samples used in the downstream task. On several text classification tasks, we show that as the number of training examples grow into the millions, the accuracy gap between finetuning BERT-based model and training vanilla LSTM from scratch narrows to within 1%. Our findings indicate that MLM-based models might reach a diminishing return point as the supervised data size increases significantly.