论文标题

跨域情感分析域的建议图表:20个域研究的发现

Recommendation Chart of Domains for Cross-Domain Sentiment Analysis:Findings of A 20 Domain Study

论文作者

Sheoran, Akash, Kanojia, Diptesh, Joshi, Aditya, Bhattacharyya, Pushpak

论文摘要

跨域情绪分析(CDSA)有助于解决数据稀缺性问题的情况,在这些方案中,标记的域(称为目标域)的数据不足或不足。但是,决定选择一个域(称为源域)以利用IS的决定充其量是直观的。在本文中,我们研究了文本相似性指标,以促进CDSA的源域选择。我们使用11个相似性指标在20个域(所有可能对)上报告结果。具体而言,我们将CDSA性能与这些指标的不同域对进行了比较,以实现给定目标域,可以选择合适的源域。这些指标包括两个新的指标,用于评估域的适应性,以帮助源域选择标记的数据,并利用单词和基于句子的嵌入为未标记数据的指标。我们实验的目的是推荐图表,该图表为给定目标域的CDSA提供了K的最佳源域。我们表明,我们的相似性指标返回的最佳K源域的精度超过50%,对于不同的值。

Cross-domain sentiment analysis (CDSA) helps to address the problem of data scarcity in scenarios where labelled data for a domain (known as the target domain) is unavailable or insufficient. However, the decision to choose a domain (known as the source domain) to leverage from is, at best, intuitive. In this paper, we investigate text similarity metrics to facilitate source domain selection for CDSA. We report results on 20 domains (all possible pairs) using 11 similarity metrics. Specifically, we compare CDSA performance with these metrics for different domain-pairs to enable the selection of a suitable source domain, given a target domain. These metrics include two novel metrics for evaluating domain adaptability to help source domain selection of labelled data and utilize word and sentence-based embeddings as metrics for unlabelled data. The goal of our experiments is a recommendation chart that gives the K best source domains for CDSA for a given target domain. We show that the best K source domains returned by our similarity metrics have a precision of over 50%, for varying values of K.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源