论文标题
使用LSTM语言模型的基于HMM的语音识别的全和解码
Full-Sum Decoding for Hybrid HMM based Speech Recognition using LSTM Language Model
论文作者
论文摘要
在基于混合HMM的语音识别中,LSTM语言模型已被广泛应用并取得了很大的改进。对任何无限环境进行建模的理论能力表明,不应在解码中应用重组。这激发了重新考虑对HMM状态序列的完整求和,而不是解码中的Viterbi近似。我们在决策方面探索了更准确的概率的潜在增益,并使用修改后的前缀搜索框架应用了全和解码。提出的全&解码器在总机和Librispeech语料库上进行了评估。使用CE和SMBR培训标准的不同模型。此外,评估了作为一般贝叶斯决策规则的近似变体的地图和混乱网络解码。在几乎所有情况下,都没有额外的成本来实现对强基础的一致改进。我们还讨论了调整工作,效率和全和解码的某些局限性。
In hybrid HMM based speech recognition, LSTM language models have been widely applied and achieved large improvements. The theoretical capability of modeling any unlimited context suggests that no recombination should be applied in decoding. This motivates to reconsider full summation over the HMM-state sequences instead of Viterbi approximation in decoding. We explore the potential gain from more accurate probabilities in terms of decision making and apply the full-sum decoding with a modified prefix-tree search framework. The proposed full-sum decoding is evaluated on both Switchboard and Librispeech corpora. Different models using CE and sMBR training criteria are used. Additionally, both MAP and confusion network decoding as approximated variants of general Bayes decision rule are evaluated. Consistent improvements over strong baselines are achieved in almost all cases without extra cost. We also discuss tuning effort, efficiency and some limitations of full-sum decoding.