更多数据可能导致我们误入歧途：在标签偏差的存在下积极的数据获取

论文标题

更多数据可能导致我们误入歧途：在标签偏差的存在下积极的数据获取

More Data Can Lead Us Astray: Active Data Acquisition in the Presence of Label Bias

论文作者

Li, Yunyi, De-Arteaga, Maria, Saar-Tsechansky, Maytal

论文摘要

人们对算法偏见风险的认识越来越多，促进了围绕偏见缓解策略的努力。大多数提出的方法属于两个类别之一：（1）对预测模型施加算法公平限制，以及（2）收集其他培训样本。最近以及在这两个类别的交集中，已经开发了在公平限制下提出主动学习的方法。但是，提出的偏差缓解策略通常忽略了观察到的标签中呈现的偏差。在这项工作中，我们研究了在标签偏见存在下积极数据收集策略的公平考虑因素。我们首先在监督学习系统的背景下概述了不同类型的标签偏差。然后，我们从经验上表明，当忽略标签偏差时，收集更多数据会加剧偏见，并施加依赖数据收集过程中观察到的标签的公平约束可能无法解决问题。我们的结果说明了部署模型的意外后果，该模型试图减轻单一类型的偏见，同时忽略了他人，强调了明确区分公平感知算法的偏见类型的重要性，并强调了在数据收集过程中忽略标签偏见的风险。

An increased awareness concerning risks of algorithmic bias has driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propose active learning under fairness constraints have been developed. However, proposed bias mitigation strategies typically overlook the bias presented in the observed labels. In this work, we study fairness considerations of active data collection strategies in the presence of label bias. We first present an overview of different types of label bias in the context of supervised learning systems. We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem. Our results illustrate the unintended consequences of deploying a model that attempts to mitigate a single type of bias while neglecting others, emphasizing the importance of explicitly differentiating between the types of bias that fairness-aware algorithms aim to address, and highlighting the risks of neglecting label bias during data collection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题