论文标题
新数据集和文档网络嵌入方法的基准,用于科学专家发现
New Datasets and a Benchmark of Document Network Embedding Methods for Scientific Expert Finding
论文作者
论文摘要
科学文献的增长速度比以往任何时候都更快。由于出版物的数量增加,并且由于专业知识的多样性越来越多,因此在特定科学领域中找到专家从来没有像今天这样艰难。为了应对这一挑战,自动专家发现算法依赖于广泛的科学异质网络将文本查询与潜在的专家候选人匹配。在这个方向上,文档网络嵌入方法似乎是建立科学文献表示的理想选择。引用和作者链接包含与出版物的文本内容有关的主要互补信息。在本文中,我们通过利用从科学引用网络和三个科学问题和答案网站提取的数据来为文档网络中的专家发现提出基准。我们将几种算法在这些不同的数据源上的性能进行比较,并进一步研究嵌入方法在专家查找任务上的适用性。
The scientific literature is growing faster than ever. Finding an expert in a particular scientific domain has never been as hard as today because of the increasing amount of publications and because of the ever growing diversity of expertise fields. To tackle this challenge, automatic expert finding algorithms rely on the vast scientific heterogeneous network to match textual queries with potential expert candidates. In this direction, document network embedding methods seem to be an ideal choice for building representations of the scientific literature. Citation and authorship links contain major complementary information to the textual content of the publications. In this paper, we propose a benchmark for expert finding in document networks by leveraging data extracted from a scientific citation network and three scientific question & answer websites. We compare the performances of several algorithms on these different sources of data and further study the applicability of embedding methods on an expert finding task.