论文标题
对罗马尼亚文本的作者归因的几种AI技术的比较
A comparison of several AI techniques for authorship attribution on Romanian texts
论文作者
论文摘要
确定文本的作者是一项艰巨的任务。在这里,我们比较了多种AI技术,用于通过考虑有限数量的语音部分(介词,副词和连词)来分类由多个作者编写的文学文本。我们还介绍了一个新的数据集,该数据集由我们运行算法的罗马尼亚语言编写的文本组成。比较的方法是人工神经网络,支持向量机器,多表达编程,具有C5.0的决策树和K-Nearealt邻居。首先,数值实验表明问题很困难,但是某些算法能够在测试集上产生不错的错误。
Determining the author of a text is a difficult task. Here we compare multiple AI techniques for classifying literary texts written by multiple authors by taking into account a limited number of speech parts (prepositions, adverbs, and conjunctions). We also introduce a new dataset composed of texts written in the Romanian language on which we have run the algorithms. The compared methods are Artificial Neural Networks, Support Vector Machines, Multi Expression Programming, Decision Trees with C5.0, and k-Nearest Neighbour. Numerical experiments show, first of all, that the problem is difficult, but some algorithms are able to generate decent errors on the test set.