论文标题
单词频率的统计模式暗示人类语言的概率性质
Statistical patterns of word frequency suggesting the probabilistic nature of human languages
论文作者
论文摘要
传统的语言理论在很大程度上将语言视为由严格规则组成的正式系统。但是,它们在处理真实语言,统计自然语言处理的最新成功以及许多心理实验的发现中的失败表明,语言可能比正式系统更像是概率系统,因此不能以正式语言理论的一种/或规则来忠实地建模。本研究基于真实的语言数据,证实了那些重要的语言问题,例如语言普遍性,惯性漂移和语言变化,可以转化为假释中的概率和频率模式。这些发现表明,人类语言本质上很可能是概率系统,并且统计可能会成为人类语言的内在特性。
Traditional linguistic theories have largely regard language as a formal system composed of rigid rules. However, their failures in processing real language, the recent successes in statistical natural language processing, and the findings of many psychological experiments have suggested that language may be more a probabilistic system than a formal system, and thus cannot be faithfully modeled with the either/or rules of formal linguistic theory. The present study, based on authentic language data, confirmed that those important linguistic issues, such as linguistic universal, diachronic drift, and language variations can be translated into probability and frequency patterns in parole. These findings suggest that human language may well be probabilistic systems by nature, and that statistical may well make inherent properties of human languages.