蜘蛛：互连数据和实体关系的选择性绘制

论文标题

蜘蛛：互连数据和实体关系的选择性绘制

SPIDER: Selective Plotting of Interconnected Data and Entity Relations

论文作者

Addepalli, Pranav, Wu, Eric, Bossart, Douglas, Lin, Christina, Smith, Allistar

论文摘要

情报分析师长期以来一直在苦苦挣扎的大量数据，必须每天进行研究。在美国陆军中，这项活动涉及调和来自各种来源的信息，该过程在一定程度上是自动化的，但仍然是手动的。为了促进自动化，语义分析原型旨在帮助智力分析过程。该工具称为互连数据和实体关系（蜘蛛）的选择性绘制，从文本中提取实体及其关系，以简化研究。蜘蛛是一个可以通过Web浏览器远程访问的Web应用程序，并具有三个主要组件：（1）使用Stanford Corenlp读取文档，提取实体和关系的Java API，（2）NEO4J图形数据库存储实体，关系和属性以及属性；（3）基于JavaScript的Sigmajs可视化工具，用于在浏览器上显示图形。蜘蛛可以将文档分析扩展到数千个文件以快速可视化，使情报分析过程更有效，并使军事领导更快地了解各种潜在隐藏的知识。

Intelligence analysts have long struggled with an abundance of data that must be investigated on a daily basis. In the U.S. Army, this activity involves reconciling information from various sources, a process that has been automated to a certain extent, but which remains highly manual. To promote automation, a semantic analysis prototype was designed to aid in the intelligence analysis process. This tool, called Selective Plotting of Interconnected Data and Entity Relations (SPIDER), extracts entities and their relationships from text in order to streamline investigations. SPIDER is a web application that can be remotely-accessed via a web browser, and has three major components: (1) a Java API that reads documents, extracts entities and relationships using Stanford CoreNLP, (2) a Neo4j graph database that stores entities, relationships, and properties; (3) a JavaScript-based SigmaJS visualization tool for displaying the graph on the browser. SPIDER can scale document analysis to thousands of files for quick visualization, making the intelligence analysis process more efficient, and allowing military leadership quicker insights into a vast array of potentially-hidden knowledge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题