NU-GAN：高分辨率神经升采样

论文标题

NU-GAN：高分辨率神经升采样

NU-GAN: High resolution neural upsampling with GAN

论文作者

Kumar, Rithesh, Kumar, Kundan, Anand, Vicki, Bengio, Yoshua, Courville, Aaron

论文摘要

在本文中，我们提出了Nu-Gan，这是一种将音频从较低采样率重新采样到更高采样率（UPS采样）的新方法。音频上采样是一个重要的问题，因为生产生成的语音技术需要以高采样速率运行。此类应用程序以44.1 kHz或48 kHz的分辨率使用音频，而当前的语音合成方法则配备了最多24 kHz分辨率。 Nu-Gan通过利用gans利用技术来生成音频的技术来解决文本到语音（TTS）管道中单独的组件迈向求解。 ABX偏好测试表明，我们的NU-GAN重采样器能够将22 kHz重新采样至44.1 kHz音频，这与单个扬声器数据集的随机机会仅比原始音频只有7.4％，而多演讲者数据集的机会高10.8％。

In this paper, we propose NU-GAN, a new method for resampling audio from lower to higher sampling rates (upsampling). Audio upsampling is an important problem since productionizing generative speech technology requires operating at high sampling rates. Such applications use audio at a resolution of 44.1 kHz or 48 kHz, whereas current speech synthesis methods are equipped to handle a maximum of 24 kHz resolution. NU-GAN takes a leap towards solving audio upsampling as a separate component in the text-to-speech (TTS) pipeline by leveraging techniques for audio generation using GANs. ABX preference tests indicate that our NU-GAN resampler is capable of resampling 22 kHz to 44.1 kHz audio that is distinguishable from original audio only 7.4% higher than random chance for single speaker dataset, and 10.8% higher than chance for multi-speaker dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题