分析基于自动编码器的声学单词嵌入

论文标题

分析基于自动编码器的声学单词嵌入

Analyzing autoencoder-based acoustic word embeddings

论文作者

Matusevych, Yevgen, Kamper, Herman, Goldwater, Sharon

论文摘要

最近的研究介绍了学习声词嵌入（AWES）的方法---编码其声学特征的单词的固定尺寸矢量表示。尽管在语音处理研究中广泛使用了敬畏，但仅通过定量评估它们，以区分整个单词代币的能力。为了更好地了解敬畏在各种下游任务和认知建模中的应用，我们需要分析敬畏的代表空间。在这里，我们分析了通过六种类型多样性语言的序列到序列编码模型学到的敬畏空间的基本属性。我们首先表明这些敬畏的人保留了有关单词的绝对持续时间和说话者的一些信息。同时，将这些敬畏的表示空间组织起来，以使单词嵌入之间的距离随着这些单词的语音差异而增加。最后，敬畏表现出单词发作偏见，类似于有关人类言语处理和词汇访问的各种研究的模式。我们认为这是一个有希望的结果，并鼓励对敬畏作为认知科学中潜在有用的工具的进一步评估，该工具可以提供语音处理与词汇记忆之间的联系。

Recent studies have introduced methods for learning acoustic word embeddings (AWEs)---fixed-size vector representations of words which encode their acoustic features. Despite the widespread use of AWEs in speech processing research, they have only been evaluated quantitatively in their ability to discriminate between whole word tokens. To better understand the applications of AWEs in various downstream tasks and in cognitive modeling, we need to analyze the representation spaces of AWEs. Here we analyze basic properties of AWE spaces learned by a sequence-to-sequence encoder-decoder model in six typologically diverse languages. We first show that these AWEs preserve some information about words' absolute duration and speaker. At the same time, the representation space of these AWEs is organized such that the distance between words' embeddings increases with those words' phonetic dissimilarity. Finally, the AWEs exhibit a word onset bias, similar to patterns reported in various studies on human speech processing and lexical access. We argue this is a promising result and encourage further evaluation of AWEs as a potentially useful tool in cognitive science, which could provide a link between speech processing and lexical memory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题