论文标题
使用社交媒体上自己和他人的症状和诊断的报告来预测COVID-19案例计数:中国大陆的观察性诉讼研究
Using Reports of Own and Others' Symptoms and Diagnosis on Social Media to Predict COVID-19 Case Counts: Observational Infoveillance Study in Mainland China
论文作者
论文摘要
可以利用公共社交媒体数据来预测COVID-19案例计数吗?我们分析了2019年11月1日至2020年3月31日在中国流行的类似于Twitter的社交媒体平台的微博的相关帖子。我们开发了一个机器学习分类器来识别“病态帖子”,这是对与Covid-199相关的人的症状和其他人的症状和诊断的报道。然后,我们对病态帖子和其他COVID-19帖子的预测能力进行了建模。我们发现,在官方统计数据之前长达14天,关于COVID-19的症状和诊断的报告和诊断可显着预测。但是其他COVID-19帖子没有类似的预测能力。对于地理标志的一部分(占所有检索职位的3.10%),我们发现,无论医疗资源的分配不平等,荷叶省和中国其他地区的预测模式都持续下去。研究人员和疾病控制机构应密切关注社交媒体Infosphere关于COVID-19。除了监视整体搜索和发布活动之外,这对于筛选内容并有效地从噪声中识别真实信号至关重要。
Can public social media data be harnessed to predict COVID-19 case counts? We analyzed approximately 15 million COVID-19 related posts on Weibo, a popular Twitter-like social media platform in China, from November 1, 2019 to March 31, 2020. We developed a machine learning classifier to identify "sick posts," which are reports of one's own and other people's symptoms and diagnosis related to COVID-19. We then modeled the predictive power of sick posts and other COVID-19 posts on daily case counts. We found that reports of symptoms and diagnosis of COVID-19 significantly predicted daily case counts, up to 14 days ahead of official statistics. But other COVID-19 posts did not have similar predictive power. For a subset of geotagged posts (3.10% of all retrieved posts), we found that the predictive pattern held true for both Hubei province and the rest of mainland China, regardless of unequal distribution of healthcare resources and outbreak timeline. Researchers and disease control agencies should pay close attention to the social media infosphere regarding COVID-19. On top of monitoring overall search and posting activities, it is crucial to sift through the contents and efficiently identify true signals from noise.