论文标题
表情符号:表情符号的空间表示
emojiSpace: Spatial Representation of Emojis
论文作者
论文摘要
在消息传递期间没有非语言提示的情况下,用户使用表情符号表达了部分情绪。因此,在文本消息语言模型的词汇量中具有表情符号可以显着改善许多自然语言处理(NLP)应用程序,例如在线通信分析。另一方面,通常在很大的文本语料库(例如Wikipedia或Google News数据集)上培训单词嵌入模型,其中很少有表情符号样本。在这项研究中,我们创建了表情符号,这是使用Python中的Generst库中的Word2Vec模型嵌入的组合单词 - emoji。我们在超过40亿条推文的语料库上训练了表情符号,并通过在包含超过6700万条推文的Twitter数据集上实施情感分析来评估它。对于此任务,我们比较了两个随机森林(RF)和线性支持向量机(SVM)的性能。为了进行评估,我们将表情空间的表现性能与其他两个预训练的嵌入进行了比较,并证明表情空间的表现都超过了两者的表现。
In the absence of nonverbal cues during messaging communication, users express part of their emotions using emojis. Thus, having emojis in the vocabulary of text messaging language models can significantly improve many natural language processing (NLP) applications such as online communication analysis. On the other hand, word embedding models are usually trained on a very large corpus of text such as Wikipedia or Google News datasets that include very few samples with emojis. In this study, we create emojiSpace, which is a combined word-emoji embedding using the word2vec model from the Genism library in Python. We trained emojiSpace on a corpus of more than 4 billion tweets and evaluated it by implementing sentiment analysis on a Twitter dataset containing more than 67 million tweets as an extrinsic task. For this task, we compared the performance of two different classifiers of random forest (RF) and linear support vector machine (SVM). For evaluation, we compared emojiSpace performance with two other pre-trained embeddings and demonstrated that emojiSpace outperforms both.