论文标题
深度卷积生成的对抗网络在语音数据中的本地和非本地依赖性学习以及类似规则的表示的出现
Local and non-local dependency learning and emergence of rule-like representations in speech data by Deep Convolutional Generative Adversarial Networks
论文作者
论文摘要
本文认为,对语音数据中本地和非本地依赖性的培训gan提供了有关深度神经网络如何离散连续数据的见解,以及在深度卷积体系结构中如何出现基于象征性的基于规则的形态学过程。最近,对语音的获取已被建模为Leet空间与Gans在Beguš(2020b; Arxiv:2006.03965)产生的数据之间的依赖性,后者模拟了学习简单的局部同种分布分布的学习。我们将这种方法扩展到测试包括形态学过程近似的局部和非本地语音过程的学习。我们进一步将模型的输出与行为实验的结果相似,其中对训练GAN网络的数据训练了人类受试者。出现了四个主要结论:(i)网络为语音采集的计算模型提供了有用的信息,即使在人工语法学习实验的相对较小的数据集上进行了培训; (ii)与非本地过程相比,本地过程更容易学习,该过程与人类主题中的行为数据和世界语言中的类型学相匹配。本文还提出了(iii)我们如何通过在不同培训步骤中保持潜在空间的恒定来积极观察网络在学习方面的进展和探索培训步骤对学习表征的影响。最后,本文表明(iv)网络学会编码具有单个潜在变量的前缀的存在;通过插值此变量,我们可以积极观察非本地语音过程的操作。提出的用于检索学习表征的技术对我们对甘斯如何离散连续语音数据的理解具有一般影响,并表明培训数据中的规则样概括表示为网络潜在空间中变量之间的相互作用。
This paper argues that training GANs on local and non-local dependencies in speech data offers insights into how deep neural networks discretize continuous data and how symbolic-like rule-based morphophonological processes emerge in a deep convolutional architecture. Acquisition of speech has recently been modeled as a dependency between latent space and data generated by GANs in Beguš (2020b; arXiv:2006.03965), who models learning of a simple local allophonic distribution. We extend this approach to test learning of local and non-local phonological processes that include approximations of morphological processes. We further parallel outputs of the model to results of a behavioral experiment where human subjects are trained on the data used for training the GAN network. Four main conclusions emerge: (i) the networks provide useful information for computational models of speech acquisition even if trained on a comparatively small dataset of an artificial grammar learning experiment; (ii) local processes are easier to learn than non-local processes, which matches both behavioral data in human subjects and typology in the world's languages. This paper also proposes (iii) how we can actively observe the network's progress in learning and explore the effect of training steps on learning representations by keeping latent space constant across different training steps. Finally, this paper shows that (iv) the network learns to encode the presence of a prefix with a single latent variable; by interpolating this variable, we can actively observe the operation of a non-local phonological process. The proposed technique for retrieving learning representations has general implications for our understanding of how GANs discretize continuous speech data and suggests that rule-like generalizations in the training data are represented as an interaction between variables in the network's latent space.