用于偏见神经网络的内部处理方法

论文标题

用于偏见神经网络的内部处理方法

Intra-Processing Methods for Debiasing Neural Networks

论文作者

Savani, Yash, White, Colin, Govindarajulu, Naveen Sundar

论文摘要

随着深度学习模型的任务，越来越多地影响人类生活的决定，例如犯罪累犯，贷款偿还和对执法的面对面的认可，偏见正变得越来越关注。词汇算法通常分为三个范式：预处理，进行中心和后处理。但是，在计算机视觉或自然语言应用中，通常从大型通用模型开始，然后对特定的用例进行微调。预处理或加工方法需要从头开始重新审阅整个模型，而后处理方法仅对模型具有黑盒访问权限，因此它们不利用训练有素的模型的权重。专门针对此微调用例创建偏差算法已被忽略了。在这项工作中，我们启动了对依据研究的新范式的研究，该研究是在处理和后处理方法之间进行的。内部处理方法专门针对Debias大型模型，这些模型已在通用数据集上训练，并在更具体的任务上进行了微调。我们展示了如何为此用例重新利用现有的处理方法，我们还提出了三种基线算法：随机扰动，层次优化和对抗性微调。我们所有的技术都可以用于所有流行的群体公平度量，例如均衡的几率或统计奇偶校验差异。我们在AIF360工具包的三个流行数据集以及Celeba Faces数据集中评估了这些方法。我们的代码可从https://github.com/abacusai/intraprocessing_debiasing获得。

As deep learning models become tasked with more and more decisions that impact human lives, such as criminal recidivism, loan repayment, and face recognition for law enforcement, bias is becoming a growing concern. Debiasing algorithms are typically split into three paradigms: pre-processing, in-processing, and post-processing. However, in computer vision or natural language applications, it is common to start with a large generic model and then fine-tune to a specific use-case. Pre- or in-processing methods would require retraining the entire model from scratch, while post-processing methods only have black-box access to the model, so they do not leverage the weights of the trained model. Creating debiasing algorithms specifically for this fine-tuning use-case has largely been neglected. In this work, we initiate the study of a new paradigm in debiasing research, intra-processing, which sits between in-processing and post-processing methods. Intra-processing methods are designed specifically to debias large models which have been trained on a generic dataset and fine-tuned on a more specific task. We show how to repurpose existing in-processing methods for this use-case, and we also propose three baseline algorithms: random perturbation, layerwise optimization, and adversarial fine-tuning. All of our techniques can be used for all popular group fairness measures such as equalized odds or statistical parity difference. We evaluate these methods across three popular datasets from the AIF360 toolkit, as well as on the CelebA faces dataset. Our code is available at https://github.com/abacusai/intraprocessing_debiasing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题