UBERT：一种新颖的语言模型，用于在UMLS Metathesaurus中大规模的同义预测

论文标题

UBERT：一种新颖的语言模型，用于在UMLS Metathesaurus中大规模的同义预测

UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

论文作者

Wijesiriwardene, Thilini, Nguyen, Vinh, Bajaj, Goonmeet, Yip, Hong Yung, Javangula, Vishesh, Mao, Yuqing, Fung, Kin Wah, Parthasarathy, Srinivasan, Sheth, Amit P., Bodenreider, Olivier

论文摘要

UMLS Metathesaurus整合了200多个生物医学源词汇。在Metathesaurus构建过程中，在词汇相似算法的协助下，人类编辑将同义词聚集在概念中。此过程容易出错，耗时。最近，为UMLS词汇对齐（UVA）任务开发了一个深度学习模型（LEXLM）。这项工作介绍了基于BERT的语言模型Ubert，通过监督的同义预测（SP）任务替换了原始下一个句子预测（NSP）任务。 Ubert对UMLS词汇比对（UVA）任务评估了Ubert对UMLS Metathesaurus构建过程的有效性。我们表明，Ubert的表现优于Lexlm，以及基于生物医学的模型。 UBERT表现的关键是专门为Ubert开发的同义预测任务，训练数据与UVA任务的紧密比对以及用于预审前UBERT的模型的相似性。

The UMLS Metathesaurus integrates more than 200 biomedical source vocabularies. During the Metathesaurus construction process, synonymous terms are clustered into concepts by human editors, assisted by lexical similarity algorithms. This process is error-prone and time-consuming. Recently, a deep learning model (LexLM) has been developed for the UMLS Vocabulary Alignment (UVA) task. This work introduces UBERT, a BERT-based language model, pretrained on UMLS terms via a supervised Synonymy Prediction (SP) task replacing the original Next Sentence Prediction (NSP) task. The effectiveness of UBERT for UMLS Metathesaurus construction process is evaluated using the UMLS Vocabulary Alignment (UVA) task. We show that UBERT outperforms the LexLM, as well as biomedical BERT-based models. Key to the performance of UBERT are the synonymy prediction task specifically developed for UBERT, the tight alignment of training data to the UVA task, and the similarity of the models used for pretrained UBERT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题