论文标题

从科学文献中高精度提取新兴概念

High-Precision Extraction of Emerging Concepts from Scientific Literature

论文作者

King, Daniel, Downey, Doug, Weld, Daniel S.

论文摘要

科学文献中新概念的识别可以帮助有效的搜索,科学趋势分析,知识基础结构等,但缺乏当前的方法。手动识别无法跟上新出版物的洪流,而现有自动技术的精度对于许多应用程序来说太低了。我们为科学文献提供了一种无监督的概念提取方法,其精度比以前的工作要高得多。我们的方法取决于一个简单但新颖的直觉:每个科学概念都可能是由一篇论文引入或普及的,该论文被随后提到该概念的文章不成比例地引用。从有关ARXIV的计算机科学论文的语料库中,我们发现我们的方法在99%的1000中获得了精确度,而先前工作的86%,并且在前15,000名摘录中获得了更好的精确性收益折衷。为了激发该领域的研究,我们发布了代码和数据(https://github.com/allenai/forecite)。

Identification of new concepts in scientific literature can help power faceted search, scientific trend analysis, knowledge-base construction, and more, but current methods are lacking. Manual identification cannot keep up with the torrent of new publications, while the precision of existing automatic techniques is too low for many applications. We present an unsupervised concept extraction method for scientific literature that achieves much higher precision than previous work. Our approach relies on a simple but novel intuition: each scientific concept is likely to be introduced or popularized by a single paper that is disproportionately cited by subsequent papers mentioning the concept. From a corpus of computer science papers on arXiv, we find that our method achieves a Precision@1000 of 99%, compared to 86% for prior work, and a substantially better precision-yield trade-off across the top 15,000 extractions. To stimulate research in this area, we release our code and data (https://github.com/allenai/ForeCite).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源