论文标题
诊断患病率与基于机器学习的诊断决策支持的功效
Diagnosis Prevalence vs. Efficacy in Machine-learning Based Diagnostic Decision Support
论文作者
论文摘要
许多最近的研究使用机器学习来预测少数ICD-9-CM代码。另一方面,实际上,医生必须考虑更广泛的诊断。这项研究旨在通过基于电子健康记录属性来预测ICD-9-CM代码,并证明诊断患病率与系统性能之间的关系,将这些先前不一致的评估设置放在更平等的基础上。我们从模拟物III数据集中提取患者特征。我们培训和评估了43个不同的机器学习分类器。在这个池中,最成功的分类器是多层感知器。根据一般机器学习期望,我们观察到所有分类器的F1分数随着疾病的患病率降低而下降。对于1000个最普遍的ICD-9-CM代码,评分从50个最普遍的ICD-9-CM代码的0.28下降到0.03。统计分析表明疾病患病率和功效之间存在中等正相关(0.5866)。
Many recent studies use machine learning to predict a small number of ICD-9-CM codes. In practice, on the other hand, physicians have to consider a broader range of diagnoses. This study aims to put these previously incongruent evaluation settings on a more equal footing by predicting ICD-9-CM codes based on electronic health record properties and demonstrating the relationship between diagnosis prevalence and system performance. We extracted patient features from the MIMIC-III dataset for each admission. We trained and evaluated 43 different machine learning classifiers. Among this pool, the most successful classifier was a Multi-Layer Perceptron. In accordance with general machine learning expectation, we observed all classifiers' F1 scores to drop as disease prevalence decreased. Scores fell from 0.28 for the 50 most prevalent ICD-9-CM codes to 0.03 for the 1000 most prevalent ICD-9-CM codes. Statistical analyses showed a moderate positive correlation between disease prevalence and efficacy (0.5866).