论文标题
贝叶斯置换训练深度表示学习方法,用于使用变异自动编码器来增强语音
A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder
论文作者
论文摘要
最近,深层表示学习(DRL)模型变化自动编码器(VAE)已被用于执行语音增强(SE)。但是,据我们所知,当前基于VAE的SE方法仅将VAE应用于模型语音信号,而使用传统的非负矩阵分解(NMF)模型对噪声进行建模。使用NMF的最重要原因之一是,这些基于VAE的方法不能从观察到的信号中解散语音和噪声潜在变量。基于贝叶斯理论,本文为VAE提供了一种新型的变分下限,该结合可确保可以在监督中训练VAE,并可以将言语和噪声潜在变量从观察到的信号中删除。这意味着所提出的方法可以应用VAE来对语音和噪声信号进行建模,这与以前基于VAE的SE作品完全不同。更具体地说,提出的DRL方法可以学会将语音和噪声信号先验施加到SE的不同潜在变量集中。实验结果表明,所提出的方法不仅可以从观察到的信号中解脱出噪声和噪声潜在变量,而且还可以比相似的基于深层神经网络(DNN)SE方法获得更高的规模不变信号距离和语音质量评分。
Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to the model speech signal, while noise is modeled using the traditional non-negative matrix factorization (NMF) model. One of the most important reasons for using NMF is that these VAE-based methods cannot disentangle the speech and noise latent variables from the observed signal. Based on Bayesian theory, this paper derives a novel variational lower bound for VAE, which ensures that VAE can be trained in supervision, and can disentangle speech and noise latent variables from the observed signal. This means that the proposed method can apply the VAE to model both speech and noise signals, which is totally different from the previous VAE-based SE works. More specifically, the proposed DRL method can learn to impose speech and noise signal priors to different sets of latent variables for SE. The experimental results show that the proposed method can not only disentangle speech and noise latent variables from the observed signal but also obtain a higher scale-invariant signal-to-distortion ratio and speech quality score than the similar deep neural network-based (DNN) SE method.