论文标题

锚预测:主题建模方法

Anchor Prediction: A Topic Modeling Approach

论文作者

Dupuy, Jean, Guille, Adrien, Jacques, Julien

论文摘要

由Wikipedia等超链接连接的文档网络无处不在。作者插入了超链接,以丰富文本并促进通过网络导航。但是,作者倾向于仅插入相关超链接的一小部分,主要是因为这是一项耗时的任务。在本文中,我们介绍了一份注释,我们将其称为锚预测。即使从概念上讲,它靠近链接预测或实体链接,但这是一项不同的任务,需要开发一种特定方法来解决它。给定源文档和目标文档,此任务包括自动识别源文档中的锚点,即应携带指向目标文档的超链接的单词或术语。我们提出了一个上下文化的关系主题模型CRTM,该模型模型指示文档之间的链接,这是源文档中锚点的本地上下文的函数以及目标文档的全部内容。该模型可用于预测源文档中的锚定,给定目标文档,而无需依靠先前看到的提及或标题的字典,也没有任何外部知识图。作者可以通过自动提出超链接的建议,从而从CRTM中受益,并给定一个新文档和要连接的目标文档集。通过动态插入正在阅读的文档之间的超链接,它也可以对读者受益。对几个Wikipedia Corpora(英语,意大利语和德语)进行的实验强调了锚预测的实际实用性,并证明了我们方法的相关性。

Networks of documents connected by hyperlinks, such as Wikipedia, are ubiquitous. Hyperlinks are inserted by the authors to enrich the text and facilitate the navigation through the network. However, authors tend to insert only a fraction of the relevant hyperlinks, mainly because this is a time consuming task. In this paper we address an annotation, which we refer to as anchor prediction. Even though it is conceptually close to link prediction or entity linking, it is a different task that require developing a specific method to solve it. Given a source document and a target document, this task consists in automatically identifying anchors in the source document, i.e words or terms that should carry a hyperlink pointing towards the target document. We propose a contextualized relational topic model, CRTM, that models directed links between documents as a function of the local context of the anchor in the source document and the whole content of the target document. The model can be used to predict anchors in a source document, given the target document, without relying on a dictionary of previously seen mention or title, nor any external knowledge graph. Authors can benefit from CRTM, by letting it automatically suggest hyperlinks, given a new document and the set of target document to connect to. It can also benefit to readers, by dynamically inserting hyperlinks between the documents they're reading. Experiments conducted on several Wikipedia corpora (in English, Italian and German) highlight the practical usefulness of anchor prediction and demonstrate the relevancy of our approach.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源