通过亚群的分解来证明某些分配公平性

论文标题

通过亚群的分解来证明某些分配公平性

Certifying Some Distributional Fairness with Subpopulation Decomposition

论文作者

Kang, Mintong, Li, Linyi, Weber, Maurice, Liu, Yang, Zhang, Ce, Li, Bo

论文摘要

已经做出了广泛的努力，以理解和改善基于观察指标的机器学习模型的公平性，尤其是在医疗保险，教育和招聘决定等高风险领域。但是，考虑到ML模型的端到端性能，缺乏认证的公平性。在本文中，我们首先制定了在给定数据分布上训练的ML模型的经过认证的公平性，作为优化问题，基于模型绩效损失绑定的约束分布的限制，该分布与训练分布相吻合。然后，我们提出了一个一般的公平认证框架，并将其实例化，以既敏感转移又一般的转移情况。特别是，我们建议通过将原始数据分布分解为分析亚群来解决优化问题，并证明子问题的凸性解决方案。我们在六个现实世界中的数据集上评估了我们的认证公平性，并表明我们的认证在敏感的变化方案中很紧张，并在一般转移下提供了非平凡的认证。我们的框架灵活地集成了其他非稳定约束，我们表明它在不同的现实情况下提供了更严格的认证。我们还将我们的认证公平与高斯数据上的现有分配鲁棒性界限进行了比较，并证明我们的方法明显更紧密。

Extensive efforts have been made to understand and improve the fairness of machine learning models based on observational metrics, especially in high-stakes domains such as medical insurance, education, and hiring decisions. However, there is a lack of certified fairness considering the end-to-end performance of an ML model. In this paper, we first formulate the certified fairness of an ML model trained on a given data distribution as an optimization problem based on the model performance loss bound on a fairness constrained distribution, which is within bounded distributional distance with the training distribution. We then propose a general fairness certification framework and instantiate it for both sensitive shifting and general shifting scenarios. In particular, we propose to solve the optimization problem by decomposing the original data distribution into analytical subpopulations and proving the convexity of the subproblems to solve them. We evaluate our certified fairness on six real-world datasets and show that our certification is tight in the sensitive shifting scenario and provides non-trivial certification under general shifting. Our framework is flexible to integrate additional non-skewness constraints and we show that it provides even tighter certification under different real-world scenarios. We also compare our certified fairness bound with adapted existing distributional robustness bounds on Gaussian data and demonstrate that our method is significantly tighter.

下载PDF全文

下载文献需遵守相关版权规定

论文标题