论文标题

通过Metricgan后处理来提高语音增强模型的客观分数

Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing

论文作者

Fu, Szu-Wei, Liao, Chien-Feng, Hsieh, Tsun-An, Hung, Kuo-Hsuan, Wang, Syu-Siang, Yu, Cheng, Kuo, Heng-Cheng, Zezario, Ryandhimas E., Li, You-Jin, Chuang, Shang-Yi, Lu, Yen-Ju, Tsao, Yu

论文摘要

与许多不同自然语言处理应用中的复发性神经网络相比,变压器架构具有优越的能力。因此,我们的研究在语音增强任务中应用了修改后的变压器。具体而言,变压器中的位置编码可能不需要用于语音增强,因此,它被卷积层所取代。为了进一步提高语音质量(PESQ)分数增强语音的感知评估,使用Metricgan框架对L_1预训练的变压器进行了微调。拟议的Metricgan可以被视为一般后处理模块,以进一步提高目标的目标得分。使用深噪声抑制(DNS)挑战的组织者提供的数据集进行了实验。实验结果表明,在主观和客观评估中,所提出的系统的表现优于挑战基线。

The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications. Therefore, our study applies a modified Transformer in a speech enhancement task. Specifically, positional encoding in the Transformer may not be necessary for speech enhancement, and hence, it is replaced by convolutional layers. To further improve the perceptual evaluation of the speech quality (PESQ) scores of enhanced speech, the L_1 pre-trained Transformer is fine-tuned using a MetricGAN framework. The proposed MetricGAN can be treated as a general post-processing module to further boost the objective scores of interest. The experiments were conducted using the data sets provided by the organizer of the Deep Noise Suppression (DNS) challenge. Experimental results demonstrated that the proposed system outperformed the challenge baseline, in both subjective and objective evaluations, with a large margin.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源