论文标题

在线索语义匹配挑战中迈向第一名:预训练的语言模型Erlangshen具有倾向校正

Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language Model Erlangshen with Propensity-Corrected Loss

论文作者

Wang, Junjie, Zhang, Yuxiang, Yang, Ping, Gan, Ruyi

论文摘要

该报告描述了一个预先训练的语言模型Erlangshen,其倾向校正损失是线索语义匹配挑战中的第一名。在预训练阶段,我们基于掩盖语言建模(MLM)的知识构建动态掩盖策略,并用整个单词掩盖。此外,通过观察数据集的特定结构,预先训练的Erlangshen在微调阶段应用了倾向校正的损失(PCL)。总体而言,我们在F1得分中获得72.54分,测试集的准确性为78.90分。我们的代码可在以下网址公开获取:https://github.com/idea-ccnl/fengshenbang-lm/tree/hf-ds/fengshen/examples/clue_sim。

This report describes a pre-trained language model Erlangshen with propensity-corrected loss, the No.1 in CLUE Semantic Matching Challenge. In the pre-training stage, we construct a dynamic masking strategy based on knowledge in Masked Language Modeling (MLM) with whole word masking. Furthermore, by observing the specific structure of the dataset, the pre-trained Erlangshen applies propensity-corrected loss (PCL) in the fine-tuning phase. Overall, we achieve 72.54 points in F1 Score and 78.90 points in Accuracy on the test set. Our code is publicly available at: https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/hf-ds/fengshen/examples/clue_sim.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源