论文标题
非因果深度学习基于基于深度学习
Non causal deep learning based dereverberation
论文作者
论文摘要
在本文中,我们证明了非伴奏环境对缓解基于深度学习的自动语音识别(ASR)系统中混响的影响的有效性。首先,通过比较以前的信息和未来信息的贡献,显示了使用非cusal fir滤波器的非毒物上下文的价值。其次,对基于MLP和LSTM的替代网络进行了培训,以确认在经过干净语音训练的ASR系统中使用因果和非因果环境的影响。与流行的加权预测误差(WPE)方法相比,基于非毒物的基于深度学习的替代性相对降低了45%的相对降低,并在混响挑战中进行了清洁训练的实验。最后,提出了一种基于回荡和覆盖信号的组合的半增强测试发言人的扩展多条件训练程序,提出了减少非c-cusal dereverberation方法可能引入的任何文物或失真。与在没有WPE的最新混响挑战配方相比,这两种方法的组合均提供了平均相对减少等于10.9%和6.0%。
In this paper we demonstrate the effectiveness of non-causal context for mitigating the effects of reverberation in deep-learning-based automatic speech recognition (ASR) systems. First, the value of non-causal context using a non-causal FIR filter is shown by comparing the contributions of previous vs. future information. Second, MLP- and LSTM-based dereverberation networks were trained to confirm the effects of causal and non-causal context when used in ASR systems trained with clean speech. The non-causal deep-learning-based dereverberation provides a 45% relative reduction in word error rate (WER) compared to the popular weighted prediction error (WPE) method in experiments with clean training in the REVERB challenge. Finally, an expanded multicondition training procedure used in combination with a semi-enhanced test utterance generation based on combinations of reverberated and dereverberated signals is proposed to reduce any artifacts or distortion that may be introduced by the non-causal dereverberation methods. The combination of both approaches provided average relative reductions in WER equal to 10.9% and 6.0% when compared to the baseline system obtained with the most recent REVERB challenge recipe without and with WPE, respectively.