论文标题

深色CNN体系结构,具有新颖的合并层应用于两个苏丹阿拉伯情感数据集

A Deep CNN Architecture with Novel Pooling Layer Applied to Two Sudanese Arabic Sentiment Datasets

论文作者

Mhamed, Mustafa, Sutcliffe, Richard, Sun, Xia, Feng, Jun, Almekhlafi, Eiad, Retta, Ephrem A.

论文摘要

近年来,阿拉伯情感分析已成为重要的研究领域。最初,专注于现代标准阿拉伯语(MSA)的工作,这是最广泛使用的形式。从那时起,就进行了几种不同的方言,包括埃及人,黎凡特和摩洛哥。此外,已经创建了许多数据集来支持此类工作。但是,到目前为止,对苏丹阿拉伯语的工作减少了,该方言有3200万发言人。在本文中,引入了两个新的公开数据集,即2级苏丹情绪数据集(Sudsenti2)和3级苏丹情感数据集(Sudsenti3)。此外,提出了CNN体系结构SCM,其中包括五个CNN层以及一个新颖的合并层MMA,以提取最佳特征。该SCM+MMA模型应用于Sudsenti2和Sudsenti3,精度为92.75%和84.39%。接下来,将模型与其他深度学习分类器进行比较,并在这些新数据集上表现出优越性。最后,提出的模型适用于现有的沙特情感数据集,并将其准确度为85.55%和90.01%的MSA Hotel Arabic评论数据集。

Arabic sentiment analysis has become an important research field in recent years. Initially, work focused on Modern Standard Arabic (MSA), which is the most widely-used form. Since then, work has been carried out on several different dialects, including Egyptian, Levantine and Moroccan. Moreover, a number of datasets have been created to support such work. However, up until now, less work has been carried out on Sudanese Arabic, a dialect which has 32 million speakers. In this paper, two new publicly available datasets are introduced, the 2-Class Sudanese Sentiment Dataset (SudSenti2) and the 3-Class Sudanese Sentiment Dataset (SudSenti3). Furthermore, a CNN architecture, SCM, is proposed, comprising five CNN layers together with a novel pooling layer, MMA, to extract the best features. This SCM+MMA model is applied to SudSenti2 and SudSenti3 with accuracies of 92.75% and 84.39%. Next, the model is compared to other deep learning classifiers and shown to be superior on these new datasets. Finally, the proposed model is applied to the existing Saudi Sentiment Dataset and to the MSA Hotel Arabic Review Dataset with accuracies 85.55% and 90.01%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源