论文标题

MM-Maims:用于社交媒体中多模式索赔检测的数据集

MM-Claims: A Dataset for Multimodal Claim Detection in Social Media

论文作者

Cheema, Gullal S., Hakimov, Sherzod, Sittar, Abdul, Müller-Budack, Eric, Otto, Christian, Ewerth, Ralph

论文摘要

近年来,网络上的错误信息问题已经在语言,国家和各种社交媒体平台上广泛存在。尽管在自动化的假新闻检测方面已经进行了很多工作,但图像及其多样性的作用尚未得到很好的探索。在本文中,我们在假新闻检测管道的早期阶段调查了图像和文本的作用,称为索赔检测。为此,我们介绍了一个新颖的数据集,MM-claims,该数据集由三个主题的推文和相应图像组成:Covid-19,气候变化和广泛的技术。该数据集包含大约86000个推文,其中3400个由多个注释者手动标记,用于训练和评估多模型模型。我们详细描述数据集,评估强大的单峰和多模式基线,并分析当前模型的潜在和缺点。

In recent years, the problem of misinformation on the web has become widespread across languages, countries, and various social media platforms. Although there has been much work on automated fake news detection, the role of images and their variety are not well explored. In this paper, we investigate the roles of image and text at an earlier stage of the fake news detection pipeline, called claim detection. For this purpose, we introduce a novel dataset, MM-Claims, which consists of tweets and corresponding images over three topics: COVID-19, Climate Change and broadly Technology. The dataset contains roughly 86000 tweets, out of which 3400 are labeled manually by multiple annotators for the training and evaluation of multimodal models. We describe the dataset in detail, evaluate strong unimodal and multimodal baselines, and analyze the potential and drawbacks of current models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源