在纵向医疗保健记录上的传统和深度学习模型的解释性

论文标题

在纵向医疗保健记录上的传统和深度学习模型的解释性

Explainability of Traditional and Deep Learning Models on Longitudinal Healthcare Records

论文作者

Cheong, Lin Lee, Meharizghi, Tesfagabir, Black, Wynona, Guang, Yang, Meng, Weilin

论文摘要

深度学习的最新进展引起了人们对纵向医疗保健记录进行深度学习模型的兴趣，以预测一系列医疗事件，模型表明了高预测性能。但是，预测性能是必要的，但不足以说明临床医生持续使用所需的解释和推理。由于模型（传统与深度）之间的比较和各种解释性方法之间的比较尚未得到充分研究，因此通常缺少对解释性的严格评估。此外，根据临床医生的观点，评估解释性所需的基础真理可能是高度主观的。我们的工作是最早在传统（XGBOOST）和深度学习（LSTM引起注意）模型上评估全球和个人预测水平上有关纵向医疗保健数据的解释性绩效的作品之一。我们使用三种流行方法比较了解释性：1）Shapley添加说明（SHAP），2）层相关性传播（LRP）和3）注意。这些实现应用于具有设计的地面真相和现实世界中的Medicare索赔数据集的合成生成的数据集。我们表明，与全球和局部水平上的XGBoost相比，具有SHAP或LRP的LSTM具有较高的解释性，而具有点产生关注的LSTM则无法产生合理的含量。随着医疗保健数据量和深度学习进度的爆炸式增长，评估解释性的需求将是成功采用医疗机构中深度学习模型的关键。

Recent advances in deep learning have led to interest in training deep learning models on longitudinal healthcare records to predict a range of medical events, with models demonstrating high predictive performance. Predictive performance is necessary but insufficient, however, with explanations and reasoning from models required to convince clinicians for sustained use. Rigorous evaluation of explainability is often missing, as comparisons between models (traditional versus deep) and various explainability methods have not been well-studied. Furthermore, ground truths needed to evaluate explainability can be highly subjective depending on the clinician's perspective. Our work is one of the first to evaluate explainability performance between and within traditional (XGBoost) and deep learning (LSTM with Attention) models on both a global and individual per-prediction level on longitudinal healthcare data. We compared explainability using three popular methods: 1) SHapley Additive exPlanations (SHAP), 2) Layer-Wise Relevance Propagation (LRP), and 3) Attention. These implementations were applied on synthetically generated datasets with designed ground-truths and a real-world medicare claims dataset. We showed that overall, LSTMs with SHAP or LRP provides superior explainability compared to XGBoost on both the global and local level, while LSTM with dot-product attention failed to produce reasonable ones. With the explosion of the volume of healthcare data and deep learning progress, the need to evaluate explainability will be pivotal towards successful adoption of deep learning models in healthcare settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题