语言模型是电子健康记录数据的有效患者代表性学习技术

论文标题

语言模型是电子健康记录数据的有效患者代表性学习技术

Language Models Are An Effective Patient Representation Learning Technique For Electronic Health Record Data

论文作者

Steinberg, Ethan, Jung, Ken, Fries, Jason A., Corbin, Conor K., Pfohl, Stephen R., Shah, Nigam H.

论文摘要

广泛采用电子健康记录（EHRS），助长了使用机器学习来建立各种临床结果的预测模型的发展。这个过程通常通过具有相对较少的患者记录来训练模型来限制。我们证明，使用自然语言处理技术启发的患者代表方案可以通过将信息从整个患者人群转移到训练特定模型的任务中，从而提高临床预测模型的准确性，而只有一个人群的子集是相关的。与标准基准相比，此类患者代表方案在五项预测任务上的AUROC平均提高了3.5％，而仅当只有少量患者记录可用于培训临床预测模型时，平均改善将上升到19％。

Widespread adoption of electronic health records (EHRs) has fueled the development of using machine learning to build prediction models for various clinical outcomes. This process is often constrained by having a relatively small number of patient records for training the model. We demonstrate that using patient representation schemes inspired from techniques in natural language processing can increase the accuracy of clinical prediction models by transferring information learned from the entire patient population to the task of training a specific model, where only a subset of the population is relevant. Such patient representation schemes enable a 3.5% mean improvement in AUROC on five prediction tasks compared to standard baselines, with the average improvement rising to 19% when only a small number of patient records are available for training the clinical prediction model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题