在语言嵌入中以生物启发的结构识别

论文标题

在语言嵌入中以生物启发的结构识别

Bio-inspired Structure Identification in Language Embeddings

论文作者

Hongwei, Zhou, Elek, Oskar, Anand, Pranav, Forbes, Angus G.

论文摘要

单词嵌入是改善当代语言建模中下游表演的流行方式。但是，嵌入空间的基本几何结构尚不清楚。我们提出了一系列使用生物启发方法的探索，以遍历和可视化单词嵌入，证明了可辨别结构的证据。此外，我们的模型还产生了单词相似性排名，这些排名与共同的相似性指标，余弦相似性和欧几里得距离截然不同。我们表明，我们的生物启发的模型可用于研究不同单词嵌入技术如何导致不同的语义输出，这可以强调或掩盖文本数据中的特定解释。

Word embeddings are a popular way to improve downstream performances in contemporary language modeling. However, the underlying geometric structure of the embedding space is not well understood. We present a series of explorations using bio-inspired methodology to traverse and visualize word embeddings, demonstrating evidence of discernible structure. Moreover, our model also produces word similarity rankings that are plausible yet very different from common similarity metrics, mainly cosine similarity and Euclidean distance. We show that our bio-inspired model can be used to investigate how different word embedding techniques result in different semantic outputs, which can emphasize or obscure particular interpretations in textual data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题