在线索语义匹配挑战中迈向第一名：预训练的语言模型Erlangshen具有倾向校正

论文标题

在线索语义匹配挑战中迈向第一名：预训练的语言模型Erlangshen具有倾向校正

Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language Model Erlangshen with Propensity-Corrected Loss

论文作者

Wang, Junjie, Zhang, Yuxiang, Yang, Ping, Gan, Ruyi

论文摘要

该报告描述了一个预先训练的语言模型Erlangshen，其倾向校正损失是线索语义匹配挑战中的第一名。在预训练阶段，我们基于掩盖语言建模（MLM）的知识构建动态掩盖策略，并用整个单词掩盖。此外，通过观察数据集的特定结构，预先训练的Erlangshen在微调阶段应用了倾向校正的损失（PCL）。总体而言，我们在F1得分中获得72.54分，测试集的准确性为78.90分。我们的代码可在以下网址公开获取：https：//github.com/idea-ccnl/fengshenbang-lm/tree/hf-ds/fengshen/examples/clue_sim。

This report describes a pre-trained language model Erlangshen with propensity-corrected loss, the No.1 in CLUE Semantic Matching Challenge. In the pre-training stage, we construct a dynamic masking strategy based on knowledge in Masked Language Modeling (MLM) with whole word masking. Furthermore, by observing the specific structure of the dataset, the pre-trained Erlangshen applies propensity-corrected loss (PCL) in the fine-tuning phase. Overall, we achieve 72.54 points in F1 Score and 78.90 points in Accuracy on the test set. Our code is publicly available at: https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/hf-ds/fengshen/examples/clue_sim.

下载PDF全文

下载文献需遵守相关版权规定

论文标题