论文标题
人口统计线之间的阅读:解决毒性分类器中的偏见来源
Reading Between the Demographic Lines: Resolving Sources of Bias in Toxicity Classifiers
论文作者
论文摘要
对有毒评论的审查通常会留下不完美模型的判断。 Perspective API是Google技术孵化器拼图的创建,也许是行业中使用最广泛的毒性分类器。该模型由包括《纽约时报》在内的几个在线社区采用,以识别和过滤掉有毒的评论,目的是保持在线安全。不幸的是,Google的模型倾向于不公平地将更高的毒性得分分配给包含涉及普通目标群体身份的单词(例如,“女人,'''''''等),因为这些身份在训练数据中经常以不礼貌的方式引用。结果,经常被错误地审查引用其身份的边缘化群体产生的评论。重要的是要认识到这种意外的偏见并努力减轻其影响。为了解决这个问题,我们已经构建了几个毒性分类器,目的是减少意外偏见,同时保持强大的分类性能。
The censorship of toxic comments is often left to the judgment of imperfect models. Perspective API, a creation of Google technology incubator Jigsaw, is perhaps the most widely used toxicity classifier in industry; the model is employed by several online communities including The New York Times to identify and filter out toxic comments with the goal of preserving online safety. Unfortunately, Google's model tends to unfairly assign higher toxicity scores to comments containing words referring to the identities of commonly targeted groups (e.g., "woman,'' "gay,'' etc.) because these identities are frequently referenced in a disrespectful manner in the training data. As a result, comments generated by marginalized groups referencing their identities are often mistakenly censored. It is important to be cognizant of this unintended bias and strive to mitigate its effects. To address this issue, we have constructed several toxicity classifiers with the intention of reducing unintended bias while maintaining strong classification performance.