论文标题

Twikil- Twitter Wikipedia链接数据集

TWikiL -- The Twitter Wikipedia Link Dataset

论文作者

Meier, Florian

论文摘要

最近的研究表明,Wikipedia和其他Web服务或平台如何连接。例如,搜索引擎很大程度上依赖于Wikipedia链接表面,以满足用户的信息需求,并且经常在Reddit等其他社交媒体平台上重新使用志愿者创建的Wikipedia内容。但是,可公开访问的数据集使研究人员能够研究Wikipedia和其他平台之间的相互关系很少。除此之外,大多数研究仅着眼于某些时间点,并且不考虑历史观点。为了开始解决这些问题,我们开发了Twikil,Twitter Wikipedia链接数据集,其中包含2006年至2021年1月在Twitter上发布在Twitter上的所有Wikipedia链接。我们从推文中提取Wikipedia链接,并丰富引用的文章及其各自的Wikidata标识和Wikipedia castories的范围,这将使该数据范围很大,这将使该数据范围很大,这将使该数据范围有用。在本文中,我们描述了数据收集过程,执行初始探索性​​分析,并介绍了该数据集如何对研究社区有用的全面概述。

Recent research has shown how strongly Wikipedia and other web services or platforms are connected. For example, search engines rely heavily on surfacing Wikipedia links to satisfy their users' information needs and volunteer-created Wikipedia content frequently gets re-used on other social media platforms like Reddit. However, publicly accessible datasets that enable researchers to study the interrelationship between Wikipedia and other platforms are sparse. In addition to that, most studies only focus on certain points in time and don't consider the historical perspective. To begin solving these problems we developed TWikiL, the Twitter Wikipedia Link Dataset, which contains all Wikipedia links posted on Twitter in the period 2006 to January 2021. We extract Wikipedia links from Tweets and enrich the referenced articles with their respective Wikidata identifiers and Wikipedia topic categories, which will make this dataset immediately useful for a large range of scholarly use cases. In this paper, we describe the data collection process, perform an initial exploratory analysis and present a comprehensive overview of how this dataset can be useful for the research community.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源