LEAF临床试验语料库：从临床试验资格标准中查询生成的新资源

论文标题

LEAF临床试验语料库：从临床试验资格标准中查询生成的新资源

The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria

论文作者

Dobbins, Nicholas J, Mullen, Tony, Uzuner, Ozlem, Yetisgen, Meliha

论文摘要

根据医疗条件，程序和药物使用等资格标准，识别患者的同龄人群对于临床试验的招募至关重要。这种标准通常是在自由文本中最自然地描述的，使用临床医生和研究人员熟悉的语言。为了大规模识别潜在参与者，必须首先将这些标准转换为临床数据库的查询，这可能是劳动密集型且容易出错的。自然语言处理（NLP）方法提供了一种可能自动转换为数据库查询的潜在手段。但是，必须首先使用Corpora对其进行培训和评估，该语料库详细列出临床试验标准。在本文中，我们介绍了叶片临床试验（LCT）语料库，该语料库是一种使用高度颗粒状结构化标签，捕获一系列生物医学现象的人类宣布的临床试验资格标准描述。我们提供了我们的模式，注释过程，语料库质量和统计数据的详细信息。此外，我们提出了该语料库的基线信息提取结果，作为未来工作的基准。

Identifying cohorts of patients based on eligibility criteria such as medical conditions, procedures, and medication use is critical to recruitment for clinical trials. Such criteria are often most naturally described in free-text, using language familiar to clinicians and researchers. In order to identify potential participants at scale, these criteria must first be translated into queries on clinical databases, which can be labor-intensive and error-prone. Natural language processing (NLP) methods offer a potential means of such conversion into database queries automatically. However they must first be trained and evaluated using corpora which capture clinical trials criteria in sufficient detail. In this paper, we introduce the Leaf Clinical Trials (LCT) corpus, a human-annotated corpus of over 1,000 clinical trial eligibility criteria descriptions using highly granular structured labels capturing a range of biomedical phenomena. We provide details of our schema, annotation process, corpus quality, and statistics. Additionally, we present baseline information extraction results on this corpus as benchmarks for future work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题