单个通道语音增强的完全反复发作的特征提取

论文标题

单个通道语音增强的完全反复发作的特征提取

A fully recurrent feature extraction for single channel speech enhancement

论文作者

Shifas, Muhammed PV, Claudio, Santelli, Tsiaras, Vassilis, Stylianou, Yannis

论文摘要

卷积神经网络（CNN）模块被广泛用于构建高端语音增强神经模型。但是，香草CNN模块的特征提取能力受到集成的卷积内核的维度约束的限制 - 因此，它们具有充分的限制，可以在功能提取阶段充分建模噪声上下文信息。为此，将复发因子添加到提取CNN层的功能中，我们引入了一种强大的上下文感知功能提取策略，以增强单渠道语音。如图所示，添加复发性会导致在提取的特征级别捕获噪声属性的局部统计数据，因此，即使在非常嘈杂的条件下，建议的模型也可以有效地区分语音提示。当使用Vanilla CNN模块的增强模型评估时，在看不见的噪声条件下，建议在特征提取层中复发的模型产生了节段SNR（SSNR）的增益高达1.5 dB，在平均意见评分量表中的主观质量为0.4，而平均值得分量表则提高了，而最佳的参数则降低了25％。

Convolutional neural network (CNN) modules are widely being used to build high-end speech enhancement neural models. However, the feature extraction power of vanilla CNN modules has been limited by the dimensionality constraint of the convolution kernels that are integrated - thereby, they have limitations to adequately model the noise context information at the feature extraction stage. To this end, adding recurrency factor into the feature extracting CNN layers, we introduce a robust context-aware feature extraction strategy for single-channel speech enhancement. As shown, adding recurrency results in capturing the local statistics of noise attributes at the extracted features level and thus, the suggested model is effective in differentiating speech cues even at very noisy conditions. When evaluated against enhancement models using vanilla CNN modules, in unseen noise conditions, the suggested model with recurrency in the feature extraction layers has produced a segmental SNR (SSNR) gain of up to 1.5 dB, an improvement of 0.4 in subjective quality in the Mean Opinion Score scale, while the parameters to be optimized are reduced by 25%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题