论文标题

小核心:西班牙临床文本中的否定和不确定性语料库

NUBES: A Corpus of Negation and Uncertainty in Spanish Clinical Texts

论文作者

Lima, Salvador, Perez, Naiara, Cuadros, Montse, Rigau, German

论文摘要

本文介绍了Nubes语料库的第一个版本(西班牙生物医学文本中的否定和不确定性注释)。该语料库是正在进行的研究的一部分,目前由29,682个句子组成,这些句子是从匿名健康记录中获得的,并以否定和不确定性注释。该文章包括与西班牙语中类似语料库的详尽比较,并提供了主要注释和设计决策。此外,我们使用深度学习算法进行初步实验来验证注释的数据集。据我们所知,Nubes是西班牙语中最大的公开否定语料库,也是第一个还结合了投机提示,范围和事件的注释。

This paper introduces the first version of the NUBes corpus (Negation and Uncertainty annotations in Biomedical texts in Spanish). The corpus is part of an on-going research and currently consists of 29,682 sentences obtained from anonymised health records annotated with negation and uncertainty. The article includes an exhaustive comparison with similar corpora in Spanish, and presents the main annotation and design decisions. Additionally, we perform preliminary experiments using deep learning algorithms to validate the annotated dataset. As far as we know, NUBes is the largest publicly available corpus for negation in Spanish and the first that also incorporates the annotation of speculation cues, scopes, and events.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源