修订：一种用于测量和减轻视觉数据集中偏差的工具

论文标题

修订：一种用于测量和减轻视觉数据集中偏差的工具

REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets

论文作者

Wang, Angelina, Liu, Alexander, Zhang, Ryan, Kleiman, Anat, Kim, Leslie, Zhao, Dora, Shirai, Iroha, Narayanan, Arvind, Russakovsky, Olga

论文摘要

已知机器学习模型可以永久化甚至扩大数据中存在的偏见。但是，在模型部署之后，这些数据偏差经常不会变得明显。我们的工作解决了这个问题，并可以对大型数据集进行先发制人的分析。修订（揭示视觉偏见）是一种有助于研究视觉数据集的工具，沿三个维度呈现潜在的偏见：（1）基于对象的，（2）基于人的基于对象，以及（3）基于地理位置。基于对象的偏见与所描绘对象的大小，上下文或多样性有关。基于人的指标专注于分析数据集中人员的刻画。基于地理位置的分析考虑不同地理位置的表示。这三个维度在它们如何相互作用与数据集偏置的方式上深深交织在一起，并修改了这一点。然后，责任在于用户考虑文化和历史背景，并确定哪些偏见可能是有问题的。该工具通过建议可以采取可行的步骤来帮助用户来减轻所揭示的偏见。总体而言，我们工作的主要目的是在管道初期解决机器学习偏见问题。修订可在https://github.com/princetonvisualai/revise-tool上找到

Machine learning models are known to perpetuate and even amplify the biases present in the data. However, these data biases frequently do not become apparent until after the models are deployed. Our work tackles this issue and enables the preemptive analysis of large-scale datasets. REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset, surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based. Object-based biases relate to the size, context, or diversity of the depicted objects. Person-based metrics focus on analyzing the portrayal of people within the dataset. Geography-based analyses consider the representation of different geographic locations. These three dimensions are deeply intertwined in how they interact to bias a dataset, and REVISE sheds light on this; the responsibility then lies with the user to consider the cultural and historical context, and to determine which of the revealed biases may be problematic. The tool further assists the user by suggesting actionable steps that may be taken to mitigate the revealed biases. Overall, the key aim of our work is to tackle the machine learning bias problem early in the pipeline. REVISE is available at https://github.com/princetonvisualai/revise-tool

下载PDF全文

下载文献需遵守相关版权规定

论文标题