论文标题

众包集体实体分辨率和关系匹配传播

Crowdsourced Collective Entity Resolution with Relational Match Propagation

论文作者

Huang, Jiacheng, Hu, Wei, Bao, Zhifeng, Qu, Yuzhong

论文摘要

知识库(KBS)存储丰富而异质的实体和事实。实体分辨率(ER)旨在识别KBS中指代相同现实世界对象的实体。最近的研究表明,让人类参与E​​R的循环。他们经常解决具有成对相似性度量的实体,而不是属性值,并诉诸于人群以标记不确定的实体。但是,现有方法在某种程度上仍然遭受高劳动力成本和不足的标签。在本文中,我们提出了一种称为众包集体ER的新颖方法,该方法利用实体之间的关系来推断匹配而不是独立。具体而言,它迭代地要求人类工人将挑选的实体贴上标签,并将标签信息传播给远处的邻居。在此过程中,我们解决了候选实体修剪,概率传播,最佳问题选择和容忍错误的真理推断的问题。我们在现实世界数据集上的实验表明,与最先进的方法相比,我们的方法以较少的标签实现了卓越的准确性。

Knowledge bases (KBs) store rich yet heterogeneous entities and facts. Entity resolution (ER) aims to identify entities in KBs which refer to the same real-world object. Recent studies have shown significant benefits of involving humans in the loop of ER. They often resolve entities with pairwise similarity measures over attribute values and resort to the crowds to label uncertain ones. However, existing methods still suffer from high labor costs and insufficient labeling to some extent. In this paper, we propose a novel approach called crowdsourced collective ER, which leverages the relationships between entities to infer matches jointly rather than independently. Specifically, it iteratively asks human workers to label picked entity pairs and propagates the labeling information to their neighbors in distance. During this process, we address the problems of candidate entity pruning, probabilistic propagation, optimal question selection and error-tolerant truth inference. Our experiments on real-world datasets demonstrate that, compared with state-of-the-art methods, our approach achieves superior accuracy with much less labeling.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源