论文标题
使用主动学习来注释健康的社会决定因素,并使用神经事件提取来表征决定因素
Annotating Social Determinants of Health Using Active Learning, and Characterizing Determinants Using Neural Event Extraction
论文作者
论文摘要
卫生的社会决定因素(SDOH)会影响健康成果,而对SDOH的了解可以为临床决策提供信息。从临床文本中自动提取SDOH信息需要数据驱动的信息提取模型,该模型在注释的语料库中训练,这些模型是异质的,并且经常包括关键的SDOH。这项工作提出了一个带有SDOH注释,新颖的积极学习框架以及新语料库的首次提取结果的新语料库。社会历史注释语料库(SHAC)包括4,480个社会历史部分,其中包含12个SDOH的详细注释,以表征18K不同事件的状态,程度和时间信息。我们介绍了一个新颖的主动学习框架,该框架选择了使用替代文本分类任务作为更复杂的事件提取任务的代理选择样本进行注释。主动学习框架成功地增加了健康风险因素的频率,并改善了这些事件的自动提取,而不是无方向的注释。经过SHAC训练的事件提取模型可在来自三个机构的数据上实现高提取性能(0.82-0.93 F1),就业状态(0.81-0.86 F1)和生活状态类型(0.81-0.93 F1)。
Social determinants of health (SDOH) affect health outcomes, and knowledge of SDOH can inform clinical decision-making. Automatically extracting SDOH information from clinical text requires data-driven information extraction models trained on annotated corpora that are heterogeneous and frequently include critical SDOH. This work presents a new corpus with SDOH annotations, a novel active learning framework, and the first extraction results on the new corpus. The Social History Annotation Corpus (SHAC) includes 4,480 social history sections with detailed annotation for 12 SDOH characterizing the status, extent, and temporal information of 18K distinct events. We introduce a novel active learning framework that selects samples for annotation using a surrogate text classification task as a proxy for a more complex event extraction task. The active learning framework successfully increases the frequency of health risk factors and improves automatic extraction of these events over undirected annotation. An event extraction model trained on SHAC achieves high extraction performance for substance use status (0.82-0.93 F1), employment status (0.81-0.86 F1), and living status type (0.81-0.93 F1) on data from three institutions.