论文标题
来自变形金刚(BERT)的双向编码器表示:情感分析Odyssey
Bidirectional Encoder Representations from Transformers (BERT): A sentiment analysis odyssey
论文作者
论文摘要
该研究的目的是研究四种不同情感分析技术的相对有效性:(1)使用已发送WordNet的基于无监督的词典模型; (2)使用逻辑回归的传统监督机器学习模型; (3)使用长期短期记忆(LSTM)监督深度学习模型; (4)使用来自变形金刚(BERT)的双向编码器表示的高级监督深度学习模型。我们使用最初发布在Internet电影数据库(IMDB)上的50,000张电影评论的公开标记的Corpora,使用已发送WordNet词典,Logistic Remission,LSTM和Bert进行分析。前三个型号是在基于CPU的系统上运行的,而BERT则在基于GPU的系统上运行。根据准确性,精度,召回和F1分数评估情感分类性能。该研究提出了两个关键见解:(1)四种高级和广泛使用的情感分析技术的相对功效; (2)从文本数据中的情感分析中,预训练的先进监督深度学习模型的无可争议的优越性。这项研究为分析行业和院士的专业人员提供了有关文本分析的关键洞察力,内容涉及关键情感分析技术的比较分类绩效评估,包括最近开发的BERT。这是第一个比较先进的培训预培训的监督深度学习模型,该模型相对于LSTM的其他情感分析模型,Logistic回归并发送了WordNet。
The purpose of the study is to investigate the relative effectiveness of four different sentiment analysis techniques: (1) unsupervised lexicon-based model using Sent WordNet; (2) traditional supervised machine learning model using logistic regression; (3) supervised deep learning model using Long Short-Term Memory (LSTM); and, (4) advanced supervised deep learning models using Bidirectional Encoder Representations from Transformers (BERT). We use publicly available labeled corpora of 50,000 movie reviews originally posted on internet movie database (IMDB) for analysis using Sent WordNet lexicon, logistic regression, LSTM, and BERT. The first three models were run on CPU based system whereas BERT was run on GPU based system. The sentiment classification performance was evaluated based on accuracy, precision, recall, and F1 score. The study puts forth two key insights: (1) relative efficacy of four highly advanced and widely used sentiment analysis techniques; (2) undisputed superiority of pre-trained advanced supervised deep learning BERT model in sentiment analysis from text data. This study provides professionals in analytics industry and academicians working on text analysis key insight regarding comparative classification performance evaluation of key sentiment analysis techniques, including the recently developed BERT. This is the first research endeavor to compare the advanced pre-trained supervised deep learning model of BERT vis-à-vis other sentiment analysis models of LSTM, logistic regression, and Sent WordNet.