WaveGrad：估计波形产生梯度

论文标题

WaveGrad：估计波形产生梯度

WaveGrad: Estimating Gradients for Waveform Generation

论文作者

Chen, Nanxin, Zhang, Yu, Zen, Heiga, Weiss, Ron J., Norouzi, Mohammad, Chan, William

论文摘要

本文介绍了WaveGrad，这是波形生成的条件模型，该模型估计了数据密度的梯度。该模型是基于在得分匹配和扩散概率模型上进行的先前工作。它从高斯的白噪声信号开始，并通过基于梯度的采样器在MEL光谱图上进行迭代完善信号。 WaveGrad通过调整改进步骤的数量来提供一种自然的方式来交易推理速度，并在音频质量方面弥合非自动回忆和自回归模型之间的差距。我们发现它可以使用六个迭代生成高保真音频样本。实验表明，波浪级生成高富达音频，超过对抗性非自动回旋基线，并使用较少的顺序操作匹配基于强可能性的自回旋基线。音频样本可在https://wavegrad.github.io/上找到。

This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density. The model is built on prior work on score matching and diffusion probabilistic models. It starts from a Gaussian white noise signal and iteratively refines the signal via a gradient-based sampler conditioned on the mel-spectrogram. WaveGrad offers a natural way to trade inference speed for sample quality by adjusting the number of refinement steps, and bridges the gap between non-autoregressive and autoregressive models in terms of audio quality. We find that it can generate high fidelity audio samples using as few as six iterations. Experiments reveal WaveGrad to generate high fidelity audio, outperforming adversarial non-autoregressive baselines and matching a strong likelihood-based autoregressive baseline using fewer sequential operations. Audio samples are available at https://wavegrad.github.io/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题