论文标题
您的公平模型有多稳健?探索各种公平策略的鲁棒性
How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies
论文作者
论文摘要
随着在高风险决策中引入机器学习,确保算法公平已成为越来越重要的问题。为此,已经提出了许多关于公平性的数学定义,并且已经开发了多种优化技术,所有这些都旨在最大化明确的公平概念。但是,公平解决方案依赖于训练数据的质量,并且对噪声高度敏感。最近的研究表明,鲁棒性(模型在看不见的数据上表现良好的能力)在处理新问题时应使用的策略类型起着重要作用,因此,衡量这些策略的鲁棒性已成为一个根本问题。因此,在这项工作中,我们提出了一个新标准,以衡量各种公平优化策略的鲁棒性 - 稳健性比率。我们使用三种最受欢迎的公平策略在五个最受欢迎的公平定义方面,在五个基准标记公平数据集上进行了多次广泛的实验。从经验上讲,我们的实验表明,尽管大多数表现优于其他方法,但依赖于阈值优化的公平方法对所有评估的数据集中的噪声非常敏感。这与其他两种方法相反,对于低噪声方案而言,这对高噪声方面的情况不太公平。据我们所知,我们是第一个定量评估公平优化策略的鲁棒性的人。这可以作为为各种数据集选择最合适的公平策略的指南。
With the introduction of machine learning in high-stakes decision making, ensuring algorithmic fairness has become an increasingly important problem to solve. In response to this, many mathematical definitions of fairness have been proposed, and a variety of optimisation techniques have been developed, all designed to maximise a defined notion of fairness. However, fair solutions are reliant on the quality of the training data, and can be highly sensitive to noise. Recent studies have shown that robustness (the ability for a model to perform well on unseen data) plays a significant role in the type of strategy that should be used when approaching a new problem and, hence, measuring the robustness of these strategies has become a fundamental problem. In this work, we therefore propose a new criterion to measure the robustness of various fairness optimisation strategies - the robustness ratio. We conduct multiple extensive experiments on five bench mark fairness data sets using three of the most popular fairness strategies with respect to four of the most popular definitions of fairness. Our experiments empirically show that fairness methods that rely on threshold optimisation are very sensitive to noise in all the evaluated data sets, despite mostly outperforming other methods. This is in contrast to the other two methods, which are less fair for low noise scenarios but fairer for high noise ones. To the best of our knowledge, we are the first to quantitatively evaluate the robustness of fairness optimisation strategies. This can potentially can serve as a guideline in choosing the most suitable fairness strategy for various data sets.