论文标题
反事实公平的对抗性学习
Adversarial Learning for Counterfactual Fairness
论文作者
论文摘要
近年来,公平性已成为机器学习研究社区中的重要话题。特别是,反事实公平旨在建立预测模型,以确保最个人层面的公平性。这个想法不是在全球范围内考虑公平性,而是想象任何人都会像给定的感兴趣属性一样,例如不同的性别或种族。现有方法依赖于个人的变分自动编码,使用最大平均差异(MMD)惩罚来限制推断表示的统计依赖性及其相应的敏感属性。这可以模拟用于训练目标公平模型的反事实样本,其目的是为任何个人的每个替代版本产生相似的结果。在这项工作中,我们建议依靠一种对抗性神经学习方法,该方法比MMD惩罚更有强大的推论,并且在连续环境中尤其更适合敏感属性的值,无法详尽地枚举。实验表明,离散和连续设置的反事实公平程度方面有显着改善。
In recent years, fairness has become an important topic in the machine learning research community. In particular, counterfactual fairness aims at building prediction models which ensure fairness at the most individual level. Rather than globally considering equity over the entire population, the idea is to imagine what any individual would look like with a variation of a given attribute of interest, such as a different gender or race for instance. Existing approaches rely on Variational Auto-encoding of individuals, using Maximum Mean Discrepancy (MMD) penalization to limit the statistical dependence of inferred representations with their corresponding sensitive attributes. This enables the simulation of counterfactual samples used for training the target fair model, the goal being to produce similar outcomes for every alternate version of any individual. In this work, we propose to rely on an adversarial neural learning approach, that enables more powerful inference than with MMD penalties, and is particularly better fitted for the continuous setting, where values of sensitive attributes cannot be exhaustively enumerated. Experiments show significant improvements in term of counterfactual fairness for both the discrete and the continuous settings.