论文标题
Tweeteval:统一基准和推文分类的比较评估
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification
论文作者
论文摘要
社交媒体自然语言处理中的实验景观太分散了。每年都会提出新的共享任务和数据集,从情感分析等经典到讽刺检测或表情符号预测。因此,目前尚不清楚目前的最新状态是什么,因为没有标准化的评估协议,也没有在此类特定领域的数据上训练的一组强大的基线。在本文中,我们提出了一个新的评估框架(TweetEval),该框架由七个异质Twitter特定的分类任务组成。我们还提供了一组强大的基线作为起点,并比较不同的语言建模前训练策略。我们的最初实验表明,从现有的预训练的通用语言模型开始,然后继续在Twitter Corpora上培训它们的有效性。
The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domain-specific data. In this paper, we propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.