重铸：自动毒性检测模型的交互式审核

论文标题

重铸：自动毒性检测模型的交互式审核

RECAST: Interactive Auditing of Automatic Toxicity Detection Models

论文作者

Wright, Austin P., Shaikh, Omar, Park, Haekyu, Epperson, Will, Ahmed, Muhammed, Pinel, Stephane, Yang, Diyi, Chau, Duen Horng

论文摘要

随着有毒语言几乎在网上普遍存在，人们对利用自然语言处理的进步（NLP）的兴趣越来越多，从非常大的变压器模型到自动检测和消除有毒评论。尽管有公平的关注，缺乏对抗性鲁棒性以及对深度学习系统的预测性有限，但目前几乎没有工作来审核这些系统，并且了解它们如何为开发人员和用户工作。我们通过可视化预测的解释并为检测到的有毒语音提供替代性措辞，介绍了正在进行的工作，重铸，用于检查毒性检测模型的互动工具。

As toxic language becomes nearly pervasive online, there has been increasing interest in leveraging the advancements in natural language processing (NLP), from very large transformer models to automatically detecting and removing toxic comments. Despite the fairness concerns, lack of adversarial robustness, and limited prediction explainability for deep learning systems, there is currently little work for auditing these systems and understanding how they work for both developers and users. We present our ongoing work, RECAST, an interactive tool for examining toxicity detection models by visualizing explanations for predictions and providing alternative wordings for detected toxic speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题