论文标题
具有偏见的皮肤病变数据集和模型?不那么快
Debiasing Skin Lesion Datasets and Models? Not So Fast
论文作者
论文摘要
现在,数据驱动的模型已部署在许多现实世界应用程序中(包括自动诊断),但是从数据风险学习偏见中学到的模型。当模型学习在现实情况下找不到的虚假相关性时,它们在关键任务(例如医疗决策)中的部署可能是灾难性的。在这项工作中,我们针对皮肤化的分类模型解决了这个问题,并具有两个目标:找出有偏见的网络所利用的虚假相关性是什么,并通过从中删除这种虚假的相关性来证明模型。我们对7个视觉伪像(可能是通过网络利用的偏见来源)进行系统的集成分析,采用一种最先进的技术来防止模型学习虚假相关性,并提出数据集来测试模型以存在偏见。我们发现,尽管有趣的结果表明了有希望的未来研究,但目前的伪造方法还没有准备好解决皮肤质量模型的偏见问题。
Data-driven models are now deployed in a plethora of real-world applications - including automated diagnosis - but models learned from data risk learning biases from that same data. When models learn spurious correlations not found in real-world situations, their deployment for critical tasks, such as medical decisions, can be catastrophic. In this work we address this issue for skin-lesion classification models, with two objectives: finding out what are the spurious correlations exploited by biased networks, and debiasing the models by removing such spurious correlations from them. We perform a systematic integrated analysis of 7 visual artifacts (which are possible sources of biases exploitable by networks), employ a state-of-the-art technique to prevent the models from learning spurious correlations, and propose datasets to test models for the presence of bias. We find out that, despite interesting results that point to promising future research, current debiasing methods are not ready to solve the bias issue for skin-lesion models.