论文标题
数据集偏差的潜在来源使对机器学习算法对诊断不足的研究复杂化
Potential sources of dataset bias complicate investigation of underdiagnosis by machine learning algorithms
论文作者
论文摘要
越来越多的报告引起了人们对机器学习算法可能扩大培训数据中偏见的健康差异的风险的担忧。 Seyyed-Kalantari等。发现在三个胸部X射线数据集上训练的模型在“无发现”标签上的亚组中产生了假阳性率(FPR)的差异(表明没有疾病)。这些模型始终在已知在历史上乏味的亚组上产生较高的FPR,研究得出的结论是,这些模型表现出甚至可能会扩大系统的诊断不足。我们认为,研究中的实验设置不足以研究算法不足诊断。在没有关于数据集偏差程度和性质的特定知识(或假设)的情况下,很难研究模型偏见。重要的是,它们使用与训练数据相同的偏见(由于随机分裂)的使用使报告差异的解释严重复杂。
An increasing number of reports raise concerns about the risk that machine learning algorithms could amplify health disparities due to biases embedded in the training data. Seyyed-Kalantari et al. find that models trained on three chest X-ray datasets yield disparities in false-positive rates (FPR) across subgroups on the 'no-finding' label (indicating the absence of disease). The models consistently yield higher FPR on subgroups known to be historically underserved, and the study concludes that the models exhibit and potentially even amplify systematic underdiagnosis. We argue that the experimental setup in the study is insufficient to study algorithmic underdiagnosis. In the absence of specific knowledge (or assumptions) about the extent and nature of the dataset bias, it is difficult to investigate model bias. Importantly, their use of test data exhibiting the same bias as the training data (due to random splitting) severely complicates the interpretation of the reported disparities.