使用基于知识的概括进行仇恨言论检测任务的刻板印象偏见删除

论文标题

使用基于知识的概括进行仇恨言论检测任务的刻板印象偏见删除

Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations

论文作者

Badjatiya, Pinkesh, Gupta, Manish, Varma, Vasudeva

论文摘要

随着仇恨案件在社交媒体平台上的蔓延，设计滥用检测机制至关重要，以主动避免和控制此类事件。尽管存在仇恨言论检测的方法，但它们刻板印象，因此遭受了固有的偏见训练。传统上已经研究了用于结构化数据集的偏差删除，但我们的目的是从非结构化的文本数据中进行偏差缓解。在本文中，我们做出了两个重要的贡献。首先，我们系统地设计方法来量化任何模型的偏差，并提出用于识别模型刻板印象的单词集的算法。其次，我们提出了利用基于知识的概括用于无偏见学习的新方法。基于知识的概括提供了一种编码知识的有效方法，因为它们提供的抽象不仅概括了内容，而且还促进了从仇恨言语检测分类器中收回信息，从而减少了不平衡。我们实验多种知识概括策略，并分析其对一般绩效和缓解偏见的影响。我们使用两个现实世界数据集的实验，一个大小〜96K的Wikipedia Talk Pages数据集（Wikidetox）和一个大小〜24K的Twitter数据集，表明使用基于知识的概括通过强迫分类器从广义内容中学习来提高性能。我们的方法利用现有的知识基础，可以轻松地扩展到其他任务

With the ever-increasing cases of hate spread on social media platforms, it is critical to design abuse detection mechanisms to proactively avoid and control such incidents. While there exist methods for hate speech detection, they stereotype words and hence suffer from inherently biased training. Bias removal has been traditionally studied for structured datasets, but we aim at bias mitigation from unstructured text data. In this paper, we make two important contributions. First, we systematically design methods to quantify the bias for any model and propose algorithms for identifying the set of words which the model stereotypes. Second, we propose novel methods leveraging knowledge-based generalizations for bias-free learning. Knowledge-based generalization provides an effective way to encode knowledge because the abstraction they provide not only generalizes content but also facilitates retraction of information from the hate speech detection classifier, thereby reducing the imbalance. We experiment with multiple knowledge generalization policies and analyze their effect on general performance and in mitigating bias. Our experiments with two real-world datasets, a Wikipedia Talk Pages dataset (WikiDetox) of size ~96k and a Twitter dataset of size ~24k, show that the use of knowledge-based generalizations results in better performance by forcing the classifier to learn from generalized content. Our methods utilize existing knowledge-bases and can easily be extended to other tasks

下载PDF全文

下载文献需遵守相关版权规定

论文标题