衡量计数：谣言立场分类的情况

论文标题

衡量计数：谣言立场分类的情况

Measuring What Counts: The case of Rumour Stance Classification

论文作者

Scarton, Carolina, Silva, Diego F., Bontcheva, Kalina

论文摘要

立场分类可以成为了解用户是否相信在线谣言的强大工具。该任务旨在自动预测给定谣言的答复立场，即支持，否认，问题或评论。已经提出了许多方法，并在2017年和2019年的Rumoureval共享任务中进行了比较。结果表明，这是一个具有挑战性的问题，因为自然发生的谣言立场数据高度不平衡。本文特别质疑这些共享任务中使用的评估指标。我们重新评估了提交到两项RumouReval任务的系统，并表明这两个广泛采用的指标（准确性和宏观F1）对于四级不平衡的谣言立场分类任务并不强大，因为它们错误地倾向于对大型班级精确准确的系统。为了克服这个问题，我们提出了谣言立场检测的新评估指标。这些不仅对数据不平衡，而且还获得了能够识别两个最有用的少数群体（支持和拒绝）的更高系统。

Stance classification can be a powerful tool for understanding whether and which users believe in online rumours. The task aims to automatically predict the stance of replies towards a given rumour, namely support, deny, question, or comment. Numerous methods have been proposed and their performance compared in the RumourEval shared tasks in 2017 and 2019. Results demonstrated that this is a challenging problem since naturally occurring rumour stance data is highly imbalanced. This paper specifically questions the evaluation metrics used in these shared tasks. We re-evaluate the systems submitted to the two RumourEval tasks and show that the two widely adopted metrics -- accuracy and macro-F1 -- are not robust for the four-class imbalanced task of rumour stance classification, as they wrongly favour systems with highly skewed accuracy towards the majority class. To overcome this problem, we propose new evaluation metrics for rumour stance detection. These are not only robust to imbalanced data but also score higher systems that are capable of recognising the two most informative minority classes (support and deny).

下载PDF全文

下载文献需遵守相关版权规定

论文标题