单语和跨语性元元素的常见语义空间

论文标题

单语和跨语性元元素的常见语义空间

A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings

论文作者

García-Ferrero, Iker, Agerri, Rodrigo, Rigau, German

论文摘要

本文介绍了一种新技术，用于创建单语和跨语义的元元素。我们的方法集成了由互补技术，文本来源，知识库和语言创建的多个单词嵌入。现有的单词矢量通过线性转换和平均值投影到一个通用的语义空间。通过我们的方法，由此产生的元嵌入数保持了原始嵌入的维度，而不会在处理量不足的问题时丢失信息。广泛的经验评估证明了我们技术在先前在各种固有和外部多语言评估上的工作的有效性，从而获得了语义文本相似性和最新性能的竞争结果，以提供单词相似性和POS标记（英语和西班牙语）。由此产生的跨语性元元素也具有出色的跨语性转移学习能力。换句话说，我们可以利用资源丰富的语言中的预训练的源嵌入，以改善资源不足的语言的单词表示形式。

This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings. Our method integrates multiple word embeddings created from complementary techniques, textual sources, knowledge bases and languages. Existing word vectors are projected to a common semantic space using linear transformations and averaging. With our method the resulting meta-embeddings maintain the dimensionality of the original embeddings without losing information while dealing with the out-of-vocabulary problem. An extensive empirical evaluation demonstrates the effectiveness of our technique with respect to previous work on various intrinsic and extrinsic multilingual evaluations, obtaining competitive results for Semantic Textual Similarity and state-of-the-art performance for word similarity and POS tagging (English and Spanish). The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities. In other words, we can leverage pre-trained source embeddings from a resource-rich language in order to improve the word representations for under-resourced languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题