论文标题
样式转移和释义:寻找明智的语义相似度度量
Style-transfer and Paraphrase: Looking for a Sensible Semantic Similarity Metric
论文作者
论文摘要
自然语言处理任务的快速发展,例如样式转移,释义和机器翻译,通常要求使用语义相似性指标。近年来,开发了许多测量两个简短文本语义相似性的方法。本文为多种此类方法提供了全面的分析。使用一个新的14000句话的新数据集根据其语义相似性配对人类标记,我们证明,文献中广泛使用的指标都与这些任务中的人类判断足够近。许多最近提出的指标提供了可比的结果,但是单词搬运距离被证明是目前在重新计算文本中测量语义相似性的最合理解决方案。
The rapid development of such natural language processing tasks as style transfer, paraphrase, and machine translation often calls for the use of semantic similarity metrics. In recent years a lot of methods to measure the semantic similarity of two short texts were developed. This paper provides a comprehensive analysis for more than a dozen of such methods. Using a new dataset of fourteen thousand sentence pairs human-labeled according to their semantic similarity, we demonstrate that none of the metrics widely used in the literature is close enough to human judgment in these tasks. A number of recently proposed metrics provide comparable results, yet Word Mover Distance is shown to be the most reasonable solution to measure semantic similarity in reformulated texts at the moment.