实时通信的端到端神经语音编码

论文标题

实时通信的端到端神经语音编码

End-to-End Neural Speech Coding for Real-Time Communications

论文作者

Jiang, Xue, Peng, Xiulian, Zheng, Chengyu, Xue, Huaying, Zhang, Yuan, Lu, Yan

论文摘要

基于深度学习的方法表明了它们在音频编码方面的优势而不是传统的方法，但是对实时通信（RTC）的关注有限。本文提出了TFNET，这是RTC潜伏期较低的端到端神经语音编解码器。它需要在音频编码中很少研究的编码器滤波滤波器范式。提出了一个交错结构，用于时间过滤，以捕获短期和长期时间依赖性。此外，通过端到端的优化，TFNET通过语音增强和数据包丢失隐藏共同优化，为三个任务产生了一个一对一的网络。主观和客观结果都证明了拟议的TFNET的效率。

Deep-learning based methods have shown their advantages in audio coding over traditional ones but limited attention has been paid on real-time communications (RTC). This paper proposes the TFNet, an end-to-end neural speech codec with low latency for RTC. It takes an encoder-temporal filtering-decoder paradigm that has seldom been investigated in audio coding. An interleaved structure is proposed for temporal filtering to capture both short-term and long-term temporal dependencies. Furthermore, with end-to-end optimization, the TFNet is jointly optimized with speech enhancement and packet loss concealment, yielding a one-for-all network for three tasks. Both subjective and objective results demonstrate the efficiency of the proposed TFNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题