论文标题

科学文本的基于信息空间的语义分析

An Informational Space Based Semantic Analysis for Scientific Texts

论文作者

Suzen, Neslihan, Gorban, Alexander N., Levesley, Jeremy, Mirkes, Evgeny M.

论文摘要

自然语言处理中的一个主要问题是人类语言的自动分析和表示。人类语言对语义和建立人与智的互动是模棱两可,更深入的理解,需要努力为文本中的“含义”创建沟通和建立常识性知识基础的方案。本文介绍了用于语义分析的计算方法和量化简短科学文本的含义。提取语义特征的计算方法用于分析消息文本与新创建的大量科学文本莱斯特科学语料库之间的“情况”文本之间的关系。通过用某些属性的向量替换情况表征而不是心理特性来替换情况表征而不是心理特性的标准化:该文本所属的科学主题类别列表。首先,本文介绍了“含义空间”,其中含义的信息表示是从科学类别的文本中的单词出现中提取的,即单词的含义由有关主题类别的相对信息增益的向量表示。然后,对莱斯特科学词典核心进行统计分析的含义空间,我们研究了“含义的主要成分”,以描述含义的足够维度。本文的研究构成了文本含义的几何表示基础。

One major problem in Natural Language Processing is the automatic analysis and representation of human language. Human language is ambiguous and deeper understanding of semantics and creating human-to-machine interaction have required an effort in creating the schemes for act of communication and building common-sense knowledge bases for the 'meaning' in texts. This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts. Computational methods extracting semantic feature are used to analyse the relations between texts of messages and 'representations of situations' for a newly created large collection of scientific texts, Leicester Scientific Corpus. The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties, with the vectors of some attributes: a list of scientific subject categories that the text belongs to. First, this paper introduces 'Meaning Space' in which the informational representation of the meaning is extracted from the occurrence of the word in texts across the scientific categories, i.e., the meaning of a word is represented by a vector of Relative Information Gain about the subject categories. Then, the meaning space is statistically analysed for Leicester Scientific Dictionary-Core and we investigate 'Principal Components of the Meaning' to describe the adequate dimensions of the meaning. The research in this paper conducts the base for the geometric representation of the meaning of texts.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源