论文标题
多主体学精致嵌入(更多):一种混合多主体和基于生物医学概念的语义语义表示
Multi-Ontology Refined Embeddings (MORE): A Hybrid Multi-Ontology and Corpus-based Semantic Representation for Biomedical Concepts
论文作者
论文摘要
目的:目前,临床应用中自然语言处理(NLP)分析的主要限制是,可以在不同文本中以各种形式引用概念。本文介绍了多主体学精制嵌入(更多),这是一种新型混合框架,可将多个本体学知识纳入分布语义模型,从临床文本的语料库中学到。 材料和方法:我们将基于语料库的组件的Radcore和Mimic-III自由文本数据集使用。对于基于本体的部分,我们使用医学主题标题(网格)本体论和三个基于本体的基于本体的相似性度量。在我们的方法中,我们提出了一个新的学习目标,该目标是根据sigmoid跨透明镜函数修改的。 结果和讨论:我们使用生物医学概念对之间的两个既定语义相似性数据集评估了生成单词嵌入的质量。在第一个具有29个概念对的数据集中,与医师和医学编码者建立的相似性得分,More的相似性得分具有最高的组合相关性(0.633),该分数比基线模型高5.0%,比基线模型的相似性高度高12.4%,比基于449概念的相似性的均值相似度为449概念pairs,比449概念的平均值高于449概念的平均值。四个医疗居民的相似性等级,这使Skip-gram模型的表现高出8.1%,最佳本体学量度却高出6.9%。
Objective: Currently, a major limitation for natural language processing (NLP) analyses in clinical applications is that a concept can be referenced in various forms across different texts. This paper introduces Multi-Ontology Refined Embeddings (MORE), a novel hybrid framework for incorporating domain knowledge from multiple ontologies into a distributional semantic model, learned from a corpus of clinical text. Materials and Methods: We use the RadCore and MIMIC-III free-text datasets for the corpus-based component of MORE. For the ontology-based part, we use the Medical Subject Headings (MeSH) ontology and three state-of-the-art ontology-based similarity measures. In our approach, we propose a new learning objective, modified from the Sigmoid cross-entropy objective function. Results and Discussion: We evaluate the quality of the generated word embeddings using two established datasets of semantic similarities among biomedical concept pairs. On the first dataset with 29 concept pairs, with the similarity scores established by physicians and medical coders, MORE's similarity scores have the highest combined correlation (0.633), which is 5.0% higher than that of the baseline model and 12.4% higher than that of the best ontology-based similarity measure.On the second dataset with 449 concept pairs, MORE's similarity scores have a correlation of 0.481, with the average of four medical residents' similarity ratings, and that outperforms the skip-gram model by 8.1% and the best ontology measure by 6.9%.