Bench博士：临床自然语言处理的诊断推理基准

论文标题

Bench博士：临床自然语言处理的诊断推理基准

DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing

论文作者

Gao, Yanjun, Dligach, Dmitriy, Miller, Timothy, Caskey, John, Sharma, Brihat, Churpek, Matthew M, Afshar, Majid

论文摘要

通过人工智能增强了电子健康记录（EHR）的有意义使用电子健康记录（EHR）在数字时代继续进步。改善提供商经验的优先事项是克服信息超负荷并减轻认知负担，因此在患者护理期间引入了更少的医疗错误和认知偏见。一种主要的医疗错误是由于依赖启发式方法的判断中的系统或可预测错误而导致的诊断错误。临床自然语言处理（CNLP）的潜力可以模拟人类的诊断推理，这些诊断推理具有从数据到诊断的远期推理，并有可能减轻认知负担和医疗错误。促进CNLP科学的现有任务主要集中在信息提取上，并通过分类任务命名实体识别。我们介绍了一套新的任务套件，作为诊断推理基准Bench博士，作为用于开发和评估具有临床诊断推理能力的CNLP模型的新基准。该套件包括来自十个公开数据集的六个任务，这些任务涉及临床文本理解，医学知识推理和诊断产生。 Bench博士是第一个旨在评估预训练语言模型的自然语言生成框架的临床套件。使用大型通用领域模型和模型进行的最先进的预训练的生成语言模型进行的实验，这些模型和模型在医学语料库中不断训练，在DR中进行了评估时，会显示出改进的机会。长椅。我们分享博士。作为公共可用的GitLab存储库，采用系统的方法来加载和评估CNLP社区的模型。

The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is diagnostic error due to systematic or predictable errors in judgment that rely on heuristics. The potential for clinical natural language processing (cNLP) to model diagnostic reasoning in humans with forward reasoning from data to diagnosis and potentially reduce the cognitive burden and medical error has not been investigated. Existing tasks to advance the science in cNLP have largely focused on information extraction and named entity recognition through classification tasks. We introduce a novel suite of tasks coined as Diagnostic Reasoning Benchmarks, DR.BENCH, as a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models. Experiments with state-of-the-art pre-trained generative language models using large general domain models and models that were continually trained on a medical corpus demonstrate opportunities for improvement when evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab repository with a systematic approach to load and evaluate models for the cNLP community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题