论文标题
基于逻辑的关系学习方法的关系提取:Ontoilper系统
A logic-based relational learning approach to relation extraction: The OntoILPER system
论文作者
论文摘要
关系提取(re)是检测和表征文本实体之间语义关系的任务,在过去的二十年中,主要是在生物医学领域中的重要性。使用监督的机器学习技术发表了许多有关关系提取的论文。这些技术中的大多数都依赖于统计方法,例如基于特征的基于特征和树 - 内核的方法。这种统计学习技术通常基于代表示例的命题假设空间,即它们采用特征的属性值表示。这种表示存在一些缺点,尤其是在提取复杂关系的情况下,需要有关涉及实例的更多上下文信息,即,它无法有效地从解析树中捕获结构信息而不会丢失信息。在这项工作中,我们提出了Ontolper,这是一种基于逻辑的关系学习方法,用于提取关系提取,该方法使用归纳逻辑编程以符号提取规则的形式生成提取模型。 Ontoilper赢得了示例的丰富关系代表,这可以减轻上述缺点。由于我们认为的几个原因,提出的关系方法似乎比统计的方法更适合于关系提取。此外,Ontoilper使用一个领域本体,该本体论指导背景知识生成过程,并用于存储提取的关系实例。对来自生物医学领域的三个蛋白质蛋白相互作用数据集进行了诱导的提取规则。将Ontoilper提取模型的性能与其他最先进的RE系统进行了比较。令人鼓舞的结果似乎证明了拟议解决方案的有效性。
Relation Extraction (RE), the task of detecting and characterizing semantic relations between entities in text, has gained much importance in the last two decades, mainly in the biomedical domain. Many papers have been published on Relation Extraction using supervised machine learning techniques. Most of these techniques rely on statistical methods, such as feature-based and tree-kernels-based methods. Such statistical learning techniques are usually based on a propositional hypothesis space for representing examples, i.e., they employ an attribute-value representation of features. This kind of representation has some drawbacks, particularly in the extraction of complex relations which demand more contextual information about the involving instances, i.e., it is not able to effectively capture structural information from parse trees without loss of information. In this work, we present OntoILPER, a logic-based relational learning approach to Relation Extraction that uses Inductive Logic Programming for generating extraction models in the form of symbolic extraction rules. OntoILPER takes profit of a rich relational representation of examples, which can alleviate the aforementioned drawbacks. The proposed relational approach seems to be more suitable for Relation Extraction than statistical ones for several reasons that we argue. Moreover, OntoILPER uses a domain ontology that guides the background knowledge generation process and is used for storing the extracted relation instances. The induced extraction rules were evaluated on three protein-protein interaction datasets from the biomedical domain. The performance of OntoILPER extraction models was compared with other state-of-the-art RE systems. The encouraging results seem to demonstrate the effectiveness of the proposed solution.