陶氏需要多少推文？：在Twitter上有效地挖掘了短期两极分化主题：日本的案例研究

论文标题

陶氏需要多少推文？：在Twitter上有效地挖掘了短期两极分化主题：日本的案例研究

How Many Tweets DoWe Need?: Efficient Mining of Short-Term Polarized Topics on Twitter: A Case Study From Japan

论文作者

Fukuma, Tomoki, Noda, Koki, Kumagai, Hiroki, Yamamoto, Hiroki, Ichikawa, Yoshiharu, Kambe, Kyosuke, Maubuchi, Yu, Toriumi, Fujio

论文摘要

近年来，社交媒体因产生两极分化而受到批评。确定新兴分歧和增长的两极分化对于记者创建警报并提供更加平衡的覆盖范围很重要。尽管最近的研究表明社交媒体上存在两极分化，但它们主要集中在有限的主题上，例如长期收集的大量数据，尤其是数月或数年的政治。尽管这些发现很有帮助，但为时已晚，无法立即创建警报。为了解决这一差距，我们开发了一种域 - 不足的采矿方法，以在短期内（即12小时）在Twitter上识别两极分化的主题。结果，我们发现2022年初与日本新闻相关的主题在12小时的范围内两极化31.6 \％。我们还分析了他们倾向于以相对较高的平均程度构建信息扩散网络，而一半的推文是由相对较少的人创建的。但是，每天在许多主题上收集大量推文并由于Twitter API的局限性监视极化是非常昂贵和不切实际的。为了使其更具成本效益，我们还使用机器学习技术开发了一种预测方法，以利用网络信息的随机收集的推文来估算极化水平。与基线方法相比，广泛的实验表明，收集成本可大大节省。特别是，我们的方法达到0.85的F得分，需要4,000条推文，比基线节省4倍。据我们所知，我们的工作是第一个通过低资源推文来预测主题的两极分化水平的工作。我们的发现对新闻媒体具有深远的影响，使记者能够快速有效地检测和传播两极分化信息。

In recent years, social media has been criticized for yielding polarization. Identifying emerging disagreements and growing polarization is important for journalists to create alerts and provide more balanced coverage. While recent studies have shown the existence of polarization on social media, they primarily focused on limited topics such as politics with a large volume of data collected in the long term, especially over months or years. While these findings are helpful, they are too late to create an alert immediately. To address this gap, we develop a domain-agnostic mining method to identify polarized topics on Twitter in a short-term period, namely 12 hours. As a result, we find that daily Japanese news-related topics in early 2022 were polarized by 31.6\% within a 12-hour range. We also analyzed that they tend to construct information diffusion networks with a relatively high average degree, and half of the tweets are created by a relatively small number of people. However, it is very costly and impractical to collect a large volume of tweets daily on many topics and monitor the polarization due to the limitations of the Twitter API. To make it more cost-efficient, we also develop a prediction method using machine learning techniques to estimate the polarization level using randomly collected tweets leveraging the network information. Extensive experiments show a significant saving in collection costs compared to baseline methods. In particular, our approach achieves F-score of 0.85, requiring 4,000 tweets, 4x savings than the baseline. To the best of our knowledge, our work is the first to predict the polarization level of the topics with low-resource tweets. Our findings have profound implications for the news media, allowing journalists to detect and disseminate polarizing information quickly and efficiently.

下载PDF全文

下载文献需遵守相关版权规定

论文标题