命名实体链接的强大启发式方法

论文标题

命名实体链接的强大启发式方法

Strong Heuristics for Named Entity Linking

论文作者

Čuljak, Marko, Spitz, Andreas, West, Robert, Arora, Akhil

论文摘要

新闻中命名的实体链接（NEL）是一项艰巨的努力，这是由于看不见和新兴实体的频率，这需要使用无监督或零照片的方法。但是，这种方法倾向于引起警告，例如不整合新兴实体的合适知识库（例如Wikidata），缺乏可扩展性和差的可解释性。在这里，我们考虑在Quotebank中的人歧义，这是新闻中大量的演讲者的引文，并调查了NEL在网络规模的语料库中直观，轻巧且可扩展的启发式方法的适用性。我们表现最好的启发式歧义分别在QuoteBank和Aida-Conll基准上分别占94％和63％。此外，提出的启发式方法与最先进的无监督和零射击方法，本本系和MGenre相比，从而作为无监督和零摄像的实体链接的强基础。

Named entity linking (NEL) in news is a challenging endeavour due to the frequency of unseen and emerging entities, which necessitates the use of unsupervised or zero-shot methods. However, such methods tend to come with caveats, such as no integration of suitable knowledge bases (like Wikidata) for emerging entities, a lack of scalability, and poor interpretability. Here, we consider person disambiguation in Quotebank, a massive corpus of speaker-attributed quotations from the news, and investigate the suitability of intuitive, lightweight, and scalable heuristics for NEL in web-scale corpora. Our best performing heuristic disambiguates 94% and 63% of the mentions on Quotebank and the AIDA-CoNLL benchmark, respectively. Additionally, the proposed heuristics compare favourably to the state-of-the-art unsupervised and zero-shot methods, Eigenthemes and mGENRE, respectively, thereby serving as strong baselines for unsupervised and zero-shot entity linking.

下载PDF全文

下载文献需遵守相关版权规定

论文标题