论文标题
奥斯卡:单词嵌入中偏见的正交子空间校正和纠正
OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings
论文作者
论文摘要
已知语言表示会带来刻板印象的偏见,从而导致下游任务的偏见。尽管现有方法可有效通过线性投影缓解偏见,但这种方法过于激进:它们不仅消除了偏见,而且从单词嵌入中删除了有价值的信息。我们制定了评估特定信息保留的新措施,以证明偏见消除和信息保留之间的权衡。为了应对这一挑战,我们提出了奥斯卡(正交子空间校正和整流),这是一种偏见的降低方法,重点是解散概念之间的有偏见的关联,而不是消除概念批发。我们对性别偏见的实验表明,奥斯卡是一种均衡的方法,可确保语义信息保留在嵌入中,并且偏见也得到了有效减轻。
Language representations are known to carry stereotypical biases and, as a result, lead to biased predictions in downstream tasks. While existing methods are effective at mitigating biases by linear projection, such methods are too aggressive: they not only remove bias, but also erase valuable information from word embeddings. We develop new measures for evaluating specific information retention that demonstrate the tradeoff between bias removal and information retention. To address this challenge, we propose OSCaR (Orthogonal Subspace Correction and Rectification), a bias-mitigating method that focuses on disentangling biased associations between concepts instead of removing concepts wholesale. Our experiments on gender biases show that OSCaR is a well-balanced approach that ensures that semantic information is retained in the embeddings and bias is also effectively mitigated.