论文标题
NMTSCORE:基于翻译的文本相似性测量的多语言分析
NMTScore: A Multilingual Analysis of Translation-based Text Similarity Measures
论文作者
论文摘要
能够对短文本段的相似性进行排名是神经机器翻译的有趣奖励功能。基于翻译的相似性度量包括直接和枢轴翻译概率以及跨样的翻译概率,到目前为止尚未研究。我们在多语言NMT的共同框架中分析了这些度量,并释放了NMTScore库(可在https://github.com/zurichnlp/nmtscore中获得)。与诸如句子嵌入之类的基线相比,基于翻译的措施在释义识别中证明具有竞争力,并且对对抗性或多语言输入更为强大,尤其是在应用适当的归一化时。当用于基于参考的2个任务和17种语言中数据之间的文本生成评估时,基于翻译的措施显示出与人类判断的相对较高的相关性。
Being able to rank the similarity of short text segments is an interesting bonus feature of neural machine translation. Translation-based similarity measures include direct and pivot translation probability, as well as translation cross-likelihood, which has not been studied so far. We analyze these measures in the common framework of multilingual NMT, releasing the NMTScore library (available at https://github.com/ZurichNLP/nmtscore). Compared to baselines such as sentence embeddings, translation-based measures prove competitive in paraphrase identification and are more robust against adversarial or multilingual input, especially if proper normalization is applied. When used for reference-based evaluation of data-to-text generation in 2 tasks and 17 languages, translation-based measures show a relatively high correlation to human judgments.