论文标题
参考书目数据库中的作者名称歧义:调查
Author Name Disambiguation in Bibliographic Databases: A Survey
论文作者
论文摘要
自上十年以来,实体分辨率是信息系统领域中充满挑战且热门的研究领域。参考书目数据库(BD)(如DBLP,Citeseer和Scopus)中的作者名称歧义(和)是实体分辨率的专业领域。鉴于许多基本作者的引用,任务是找到哪些引用属于同一作者。在这项调查中,我们从三个基本和问题开始,然后需要解决方案和挑战。为处理和问题提供了一个通用的五步框架。这些步骤是; (1)数据集的准备(2)选择出版物属性(3)相似性指标的选择(4)模型选择和(5)聚类性能评估。还提供了对相似性指标和方法的分类和阐述。最后,为这一动态研究领域提供了未来的方向和建议。
Entity resolution is a challenging and hot research area in the field of Information Systems since last decade. Author Name Disambiguation (AND) in Bibliographic Databases (BD) like DBLP , Citeseer , and Scopus is a specialized field of entity resolution. Given many citations of underlying authors, the AND task is to find which citations belong to the same author. In this survey, we start with three basic AND problems, followed by need for solution and challenges. A generic, five-step framework is provided for handling AND issues. These steps are; (1) Preparation of dataset (2) Selection of publication attributes (3) Selection of similarity metrics (4) Selection of models and (5) Clustering Performance evaluation. Categorization and elaboration of similarity metrics and methods are also provided. Finally, future directions and recommendations are given for this dynamic area of research.