基于SNR的功能和基于强大DNN的语音增强的多样化培训数据

论文标题

基于SNR的功能和基于强大DNN的语音增强的多样化培训数据

SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement

论文作者

Rehr, Robert, Gerkmann, Timo

论文摘要

在本文中，我们解决了深度神经网络（DNN）的语音增强的概括，以提高训练数据的大小和多样性的限制，从而对看不见的噪声条件进行了增强。为了获得更多的见解，我们分析了（1）培训数据的大小和多样性，（2）不同的网络体系结构以及（3）所选功能。为了解决（1），我们在HU噪声语料库（有限的尺寸），Chime 3 Noise语料库（有限的多样性）上训练网络，还提出了一个基于自由使用的声音收集的大型且多样的数据集。为了解决（2），我们比较了完全连接的进料前向和长期记忆（LSTM）体系结构。为了解决（3），我们比较了三个输入特征，即对数嘈杂的期刊，噪声感知训练（NAT）和提议的基于信号 - 噪声比率（SNR）的噪声意识训练（SNR-NAT）。我们确认，丰富的培训数据和改进的网络体系结构有助于DNN概括。此外，我们通过实验结果和使用T分布的随机邻居嵌入（T-SNE）进行分析表明，提出的SNR-NAT功能即使在简单的网络架构中以及仅在小型数据集中进行培训时，也会产生强大的独立噪声，从而产生不见的噪声，这是本文的关键贡献。

In this paper, we address the generalization of deep neural network (DNN) based speech enhancement to unseen noise conditions for the case that training data is limited in size and diversity. To gain more insights, we analyze the generalization with respect to (1) the size and diversity of the training data, (2) different network architectures, and (3) the chosen features. To address (1), we train networks on the Hu noise corpus (limited size), the CHiME 3 noise corpus (limited diversity) and also propose a large and diverse dataset collected based on freely available sounds. To address (2), we compare a fully-connected feed-forward and a long short-term memory (LSTM) architecture. To address (3), we compare three input features, namely logarithmized noisy periodograms, noise aware training (NAT) and the proposed signal-to-noise ratio (SNR) based noise aware training (SNR-NAT). We confirm that rich training data and improved network architectures help DNNs to generalize. Furthermore, we show via experimental results and an analysis using t-distributed stochastic neighbor embedding (t-SNE) that the proposed SNR-NAT features yield robust and level independent results in unseen noise even with simple network architectures and when trained on only small datasets, which is the key contribution of this paper.

下载PDF全文

下载文献需遵守相关版权规定

论文标题