论文标题
双向表示使用多模式电子健康记录数据从变形金刚学习以预测抑郁症
Bidirectional Representation Learning from Transformers using Multimodal Electronic Health Record Data to Predict Depression
论文作者
论文摘要
机器学习算法的进步对使用电子健康记录(EHR)数据构建的表示形式学习,分类和预测模型产生了有益的影响。既努力提高模型的总体表现,又要提高其解释性,尤其是在决策过程中。在这项研究中,我们提出了一种时间深度学习模型,以使用变压器体系结构对EHR序列进行双向表示学习,以预测抑郁症的未来诊断。该模型能够从EHR汇总五个异源和高维数据源,并以时间方式处理它们在各种预测窗口下进行慢性疾病预测。我们对EHR数据进行了预处理和微调的当前趋势,以优于慢性疾病预测中最新的当前,并证明了序列中EHR代码之间的潜在关系。与最佳基线模型相比,该模型在抑郁预测中的曲线(PRAUC)下,Precision-Recall区域的最高增加从0.70到0.76。此外,每个序列中的自发权重定量证明了各种代码之间的内部关系,从而改善了模型的解释性。这些结果证明了该模型利用异质EHR数据来预测抑郁症的能力,同时实现了高准确性和可解释性,这可能有助于未来对慢性疾病筛查和早期检测的临床决策支持系统构建临床决策支持系统。
Advancements in machine learning algorithms have had a beneficial impact on representation learning, classification, and prediction models built using electronic health record (EHR) data. Effort has been put both on increasing models' overall performance as well as improving their interpretability, particularly regarding the decision-making process. In this study, we present a temporal deep learning model to perform bidirectional representation learning on EHR sequences with a transformer architecture to predict future diagnosis of depression. This model is able to aggregate five heterogenous and high-dimensional data sources from the EHR and process them in a temporal manner for chronic disease prediction at various prediction windows. We applied the current trend of pretraining and fine-tuning on EHR data to outperform the current state-of-the-art in chronic disease prediction, and to demonstrate the underlying relation between EHR codes in the sequence. The model generated the highest increases of precision-recall area under the curve (PRAUC) from 0.70 to 0.76 in depression prediction compared to the best baseline model. Furthermore, the self-attention weights in each sequence quantitatively demonstrated the inner relationship between various codes, which improved the model's interpretability. These results demonstrate the model's ability to utilize heterogeneous EHR data to predict depression while achieving high accuracy and interpretability, which may facilitate constructing clinical decision support systems in the future for chronic disease screening and early detection.