论文标题

作者的奖学金:从社交网络上下文中删除名称

The Fellowship of the Authors: Disambiguating Names from Social Network Context

论文作者

Muther, Ryan, Smith, David

论文摘要

大多数NLP使用稀疏或密集文本表示的实体链接和核心分辨率的方法都集中在检索类似的提及上。例如,常见的“ Wikification”任务将为每个实体提及的候选Wikipedia文章检索。对于许多域(例如书目引用),缺乏对每个实体的广泛文本描述的权威列表,并且命名为模棱两可的实体主要发生在其他命名实体的背景下。因此,与先前的工作不同,我们试图利用可以从文本证据中获得的个人网络而获得的信息,以消除名称。我们将基于BERT的提及表示与各种图形归纳策略结合在一起,并通过监督和无监督的集群推理方法进行实验。我们试验了来自两个领域的名称列表的数据:来自Crossref的书目引用和传播链(ISNADS)来自古典阿拉伯历史。我们发现,预处理的内域语言模型可以显着提高提及的表示形式,尤其是对于较大的语料库,并且参考书目信息的可用性(例如出版物场所或标题)也可以提高此任务的性能。我们还提出了一种新颖的监督集群推理模型,该模型为少量计算工作提供了竞争性能,这是必须在不依赖详尽范围清单的情况下确定个人的情况。

Most NLP approaches to entity linking and coreference resolution focus on retrieving similar mentions using sparse or dense text representations. The common "Wikification" task, for instance, retrieves candidate Wikipedia articles for each entity mention. For many domains, such as bibliographic citations, authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities mostly occur in the context of other named entities. Unlike prior work, therefore, we seek to leverage the information that can be gained from looking at association networks of individuals derived from textual evidence in order to disambiguate names. We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods. We experiment with data consisting of lists of names from two domains: bibliographic citations from CrossRef and chains of transmission (isnads) from classical Arabic histories. We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora, and that the availability of bibliographic information, such as publication venue or title, can also increase performance on this task. We also present a novel supervised cluster inference model which gives competitive performance for little computational effort, making it ideal for situations where individuals must be identified without relying on an exhaustive authority list.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源