论文标题
人口统计学不应该是毒性的原因:减轻文本分类中的歧视
Demographics Should Not Be the Reason of Toxicity: Mitigating Discrimination in Text Classifications with Instance Weighting
论文作者
论文摘要
随着文本分类使用的最新使用,研究人员发现,文本分类数据集中存在某些意想不到的偏见。例如,包含某些人口统计学身份方面的文本(例如,“同性恋”,“黑”)更有可能在现有的滥用语言检测数据集中滥用。结果,接受这些数据集训练的模型可能会考虑诸如“她让我很高兴成为同性恋”之类的句子,因为“同性恋”一词。在本文中,我们将文本分类数据集中的意外偏见形式化为一种从非歧视分布到歧视分布的一种选择偏见。基于这种形式化,我们通过使用实例加权恢复非歧视分布,进一步提出了一个模型不合格的识别培训框架,除了预定的一组人口统计学身份 - 外,这不需要任何额外的资源或注释。实验表明,我们的方法可以有效地减轻意外偏见的影响,而不会显着伤害模型的概括能力。
With the recent proliferation of the use of text classifications, researchers have found that there are certain unintended biases in text classification datasets. For example, texts containing some demographic identity-terms (e.g., "gay", "black") are more likely to be abusive in existing abusive language detection datasets. As a result, models trained with these datasets may consider sentences like "She makes me happy to be gay" as abusive simply because of the word "gay." In this paper, we formalize the unintended biases in text classification datasets as a kind of selection bias from the non-discrimination distribution to the discrimination distribution. Based on this formalization, we further propose a model-agnostic debiasing training framework by recovering the non-discrimination distribution using instance weighting, which does not require any extra resources or annotations apart from a pre-defined set of demographic identity-terms. Experiments demonstrate that our method can effectively alleviate the impacts of the unintended biases without significantly hurting models' generalization ability.