论文标题
WNUT-2020任务2:数据扩展告诉BERT死亡不一定是有益的
NEU at WNUT-2020 Task 2: Data Augmentation To Tell BERT That Death Is Not Necessarily Informative
论文作者
论文摘要
世界各地数百万人正在社交媒体平台上共享COVID-19相关信息。由于并非所有社交媒体上共享的信息都是有用的,因此可以识别信息帖子的机器学习系统可以帮助用户查找相关信息。在本文中,我们提出了W-NUT2020共享任务的BERT分类器系统2:识别信息丰富的Covid-19英语推文。此外,我们表明,伯特(Bert)利用一些简单的信号来识别信息丰富的推文,并为无信息的推文添加简单的模式会极大地降低伯特性能。特别是,只需在开发设置的推文中添加10次死亡,将BERT F1-得分从92.63降低到7.28。我们还提出了一种简单的数据增强技术,有助于提高BERT分类器的鲁棒性和概括能力。
Millions of people around the world are sharing COVID-19 related information on social media platforms. Since not all the information shared on the social media is useful, a machine learning system to identify informative posts can help users in finding relevant information. In this paper, we present a BERT classifier system for W-NUT2020 Shared Task 2: Identification of Informative COVID-19 English Tweets. Further, we show that BERT exploits some easy signals to identify informative tweets, and adding simple patterns to uninformative tweets drastically degrades BERT performance. In particular, simply adding 10 deaths to tweets in dev set, reduces BERT F1- score from 92.63 to 7.28. We also propose a simple data augmentation technique that helps in improving the robustness and generalization ability of the BERT classifier.