多语言语言模型表示的几何形状

论文标题

多语言语言模型表示的几何形状

The Geometry of Multilingual Language Model Representations

论文作者

Chang, Tyler A., Tu, Zhuowen, Bergen, Benjamin K.

论文摘要

我们评估多语言模型如何维护共享的多语言表示空间，同时仍在编码每种语言中对语言敏感的信息。我们使用XLM-R作为案例研究，我们表明，在均值中心后，语言占据了类似的线性子空间，并根据因果关系对语言建模性能的因果影响进行评估，并在88种语言的子空间之间进行直接比较。子空间意味着沿着对语言敏感的轴的不同，这些轴相对稳定，整个中间层都相对稳定，这些轴编码信息，例如令牌词汇。通过语言手段移动表示足以诱导不同语言的代币预测。但是，我们还确定了稳定的语言中性轴，这些轴编码信息，例如令牌位置和言论部分。我们将投影到对语言敏感和语言中性轴上的表示形象可视化，识别语言家族和词性群集，以及代表令牌位置信息的螺旋，曲线和曲线。这些结果表明，多语言语言模型沿正交语言敏感和语言中性轴编码信息，从而使模型可以提取各种功能，以进行下游任务和跨语言转移学习。

We assess how multilingual language models maintain a shared multilingual representation space while still encoding language-sensitive information in each language. Using XLM-R as a case study, we show that languages occupy similar linear subspaces after mean-centering, evaluated based on causal effects on language modeling performance and direct comparisons between subspaces for 88 languages. The subspace means differ along language-sensitive axes that are relatively stable throughout middle layers, and these axes encode information such as token vocabularies. Shifting representations by language means is sufficient to induce token predictions in different languages. However, we also identify stable language-neutral axes that encode information such as token positions and part-of-speech. We visualize representations projected onto language-sensitive and language-neutral axes, identifying language family and part-of-speech clusters, along with spirals, toruses, and curves representing token position information. These results demonstrate that multilingual language models encode information along orthogonal language-sensitive and language-neutral axes, allowing the models to extract a variety of features for downstream tasks and cross-lingual transfer learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题