Ernie在Semeval-2020任务10：通过预训练的语言模型学习单词强调选择

论文标题

Ernie在Semeval-2020任务10：通过预训练的语言模型学习单词强调选择

ERNIE at SemEval-2020 Task 10: Learning Word Emphasis Selection by Pre-trained Language Model

论文作者

Huang, Zhengjie, Feng, Shikun, Su, Weiyue, Chen, Xuyi, Wang, Shuohuan, Liu, Jiaxiang, Ouyang, Xuan, Sun, Yu

论文摘要

本文介绍了Ernie Team设计的系统，该系统在Semeval-2020任务10中获得了第一名：在Visual Media中重点选择书面文本。鉴于句子，我们被要求找出最重要的词，作为自动设计的建议。我们利用无监督的预训练模型，并在我们的任务上对这些模型进行淡化。经过调查后，我们发现以下模型在此任务中取得了出色的表现：Ernie 2.0，XLM-Roberta，Roberta和Albert。我们将重点回归损失和成对排名损失结合在一起，该损失更接近最终的M ATCHM度量，以捕捉我们的模型。而且我们还发现，其他功能工程和数据增强可以帮助提高性能。我们的最佳模型取得了0.823的最高分数，并且在各种指标中排名第一

This paper describes the system designed by ERNIE Team which achieved the first place in SemEval-2020 Task 10: Emphasis Selection For Written Text in Visual Media. Given a sentence, we are asked to find out the most important words as the suggestion for automated design. We leverage the unsupervised pre-training model and finetune these models on our task. After our investigation, we found that the following models achieved an excellent performance in this task: ERNIE 2.0, XLM-ROBERTA, ROBERTA and ALBERT. We combine a pointwise regression loss and a pairwise ranking loss which is more close to the final M atchm metric to finetune our models. And we also find that additional feature engineering and data augmentation can help improve the performance. Our best model achieves the highest score of 0.823 and ranks first for all kinds of metrics

下载PDF全文

下载文献需遵守相关版权规定

论文标题