自我发作的生成对抗网络，用于增强语音

论文标题

自我发作的生成对抗网络，用于增强语音

Self-Attention Generative Adversarial Network for Speech Enhancement

论文作者

Phan, Huy, Nguyen, Huy Le, Chén, Oliver Y., Koch, Philipp, Duong, Ngoc Q. K., McLoughlin, Ian, Mertins, Alfred

论文摘要

用于语音增强的现有生成对抗网络（GAN）仅依赖于卷积操作，这可能会掩盖整个序列输入中的时间依赖性。为了解决这个问题，我们提出了一个自我发项式层，该层是根据非本地注意的，再加上使用原始信号输入的语音增强gan（Segan）的卷积和反向解决层。此外，我们从经验上研究将自我发项层放置在具有不同层指数的（DE）卷积层以及在记忆允许时将其所有效果。我们的实验表明，向SEGAN引入自我注意力会导致在增强性能的客观评估指标中的一致改进。此外，在不同（DE）卷积层上应用并不能显着改变性能，这表明它可以方便地在最高级别（DE）卷积层上使用，并具有最小的内存开销。

Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation, which may obscure temporal dependencies across the sequence input. To remedy this issue, we propose a self-attention layer adapted from non-local attention, coupled with the convolutional and deconvolutional layers of a speech enhancement GAN (SEGAN) using raw signal input. Further, we empirically study the effect of placing the self-attention layer at the (de)convolutional layers with varying layer indices as well as at all of them when memory allows. Our experiments show that introducing self-attention to SEGAN leads to consistent improvement across the objective evaluation metrics of enhancement performance. Furthermore, applying at different (de)convolutional layers does not significantly alter performance, suggesting that it can be conveniently applied at the highest-level (de)convolutional layer with the smallest memory overhead.

下载PDF全文

下载文献需遵守相关版权规定

论文标题