多语言Hatecheck：多语言仇恨言语检测模型的功能测试

论文标题

多语言Hatecheck：多语言仇恨言语检测模型的功能测试

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models

论文作者

Röttger, Paul, Seelawi, Haitham, Nozza, Debora, Talat, Zeerak, Vidgen, Bertie

论文摘要

仇恨言语检测模型通常在持有的测试集上评估。但是，这有可能因为仇恨言语数据集中越来越有据可查的系统差距和偏见，因此绘制模型性能的不完整且潜在的误导性图片。为了实现更多针对性的诊断见解，最近的研究引入了仇恨言语检测模型的功能测试。但是，这些测试目前仅用于英语内容，这意味着它们无法支持全球数十亿语言所说的其他语言中更有效模型的发展。为了帮助解决这个问题，我们介绍了多语言Hatecheck（MHC），这是一套用于多语言仇恨言语检测模型的功能测试。 MHC涵盖了十种语言的34个功能，这比任何其他仇恨语音数据集都更多。为了说明MHC的效用，我们训练和测试了高性能的多语言仇恨语音检测模型，并揭示了单语和跨语性应用的关键模型弱点。

Hate speech detection models are typically evaluated on held-out test sets. However, this risks painting an incomplete and potentially misleading picture of model performance because of increasingly well-documented systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, recent research has thus introduced functional tests for hate speech detection models. However, these tests currently only exist for English-language content, which means that they cannot support the development of more effective models in other languages spoken by billions across the world. To help address this issue, we introduce Multilingual HateCheck (MHC), a suite of functional tests for multilingual hate speech detection models. MHC covers 34 functionalities across ten languages, which is more languages than any other hate speech dataset. To illustrate MHC's utility, we train and test a high-performing multilingual hate speech detection model, and reveal critical model weaknesses for monolingual and cross-lingual applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题