低焦点语音编码的神经特征预测指标和歧视性残差编码

论文标题

低焦点语音编码的神经特征预测指标和歧视性残差编码

Neural Feature Predictor and Discriminative Residual Coding for Low-Bitrate Speech Coding

论文作者

Yang, Haici, Lim, Wootaek, Kim, Minje

论文摘要

低和超低的 - 二核神经语音编码通过从紧凑的语音特征产生语音信号来实现前所未有的编码增益。本文通过通过复发性神经预测变量降低框架级特征序列中存在的时间冗余，从而引入了神经语音编码的其他编码效率。该预测可以实现低渗透残差表示，我们根据其对信号重建的贡献进行区分代码。特征预测和歧视性编码的协调导致动态的位分配算法，该算法在不可预测但罕见的事件上花费了更多位。结果，我们开发了可扩展，轻质，低延迟和低抑制神经语音编码系统。我们证明了使用LPCNET作为神经声码器的提出方法的优势。尽管所提出的方法保证了其预测中的因果关系，但主观测试和特征空间分析表明，与非常低的比特率中的LPCNET和Lyra V2相比，我们的模型可实现卓越的编码效率。

Low and ultra-low-bitrate neural speech coding achieves unprecedented coding gain by generating speech signals from compact speech features. This paper introduces additional coding efficiency in neural speech coding by reducing the temporal redundancy existing in the frame-level feature sequence via a recurrent neural predictor. The prediction can achieve a low-entropy residual representation, which we discriminatively code based on their contribution to the signal reconstruction. The harmonization of feature prediction and discriminative coding results in a dynamic bit allocation algorithm that spends more bits on unpredictable but rare events. As a result, we develop a scalable, lightweight, low-latency, and low-bitrate neural speech coding system. We demonstrate the advantage of the proposed methods using the LPCNet as a neural vocoder. While the proposed method guarantees causality in its prediction, the subjective tests and feature space analysis show that our model achieves superior coding efficiency compared to LPCNet and Lyra V2 in the very low bitrates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题