论文标题

复杂系统分类的方法:单词,文本等

Approaches to the classification of complex systems: Words, texts, and more

论文作者

Rovenchak, Andrij

论文摘要

本章从有关定量语言学概念的介绍性信息开始,例如等级依赖性,ZIPF定律,频谱等。在量子合奏中具有级别职业的单词分布中的相似性,暗示与统计学物理学的表面类比。这使人们能够根据该物理类比定义文本的各种参数,包括“温度”,“化学势”,熵等。这样的参数提供了一组变量,以将用于复杂系统示例的文本进行分类。此外,文本也许是收集和分析的最简单的复杂系统。 可以开发出类似的方法来研究,例如由于众所周知的语言类比而引起的基因组。我们考虑了几种方法来定义线粒体DNA和病毒RNA中的核苷酸序列,并证明了它们作为基因组比较分析的辅助工具的应用。 最后,我们将熵作为参数之一,可以轻松地从等级依赖性来计算。在复杂系统分类的某些问题中,作为一个有区别的参数,熵只能在有限类别的问题中得到适当的解释。迄今为止,其整体作用和意义仍然是一个空旷的问题。

The Chapter starts with introductory information about quantitative linguistics notions, like rank--frequency dependence, Zipf's law, frequency spectra, etc. Similarities in distributions of words in texts with level occupation in quantum ensembles hint at a superficial analogy with statistical physics. This enables one to define various parameters for texts based on this physical analogy, including "temperature", "chemical potential", entropy, and some others. Such parameters provide a set of variables to classify texts serving as an example of complex systems. Moreover, texts are perhaps the easiest complex systems to collect and analyze. Similar approaches can be developed to study, for instance, genomes due to well-known linguistic analogies. We consider a couple of approaches to define nucleotide sequences in mitochondrial DNAs and viral RNAs and demonstrate their possible application as an auxiliary tool for comparative analysis of genomes. Finally, we discuss entropy as one of the parameters, which can be easily computed from rank--frequency dependences. Being a discriminating parameter in some problems of classification of complex systems, entropy can be given a proper interpretation only in a limited class of problems. Its overall role and significance remain an open issue so far.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源