论文标题
使用联合变异自动编码器改善了远场语音识别
Improved far-field speech recognition using Joint Variational Autoencoder
论文作者
论文摘要
当源语音被噪声或房间冲动响应(RIR)破坏时,自动语音识别(ASR)系统遭受了很大的影响。通常,在不匹配和匹配的场景训练和测试中都应用了语音增强。在匹配的设置中,在不匹配的设置中,AM是固定的,在远程远场特征上对声学模型(AM)进行了训练。最近,已经探索了使用Denoising AutoCoder(DA)从远场到近语的映射语音功能。在本文中,我们专注于匹配的场景培训,并表明拟议的基于VAE的映射对DA有了显着改善。具体而言,与基于DA的增强功能相比,我们观察到单词错误率(WER)的绝对提高为2.5%,而直接在远场滤网特征上训练的AM相比,AM的绝对提高和3.96%。
Automatic Speech Recognition (ASR) systems suffer considerably when source speech is corrupted with noise or room impulse responses (RIR). Typically, speech enhancement is applied in both mismatched and matched scenario training and testing. In matched setting, acoustic model (AM) is trained on dereverberated far-field features while in mismatched setting, AM is fixed. In recent past, mapping speech features from far-field to close-talk using denoising autoencoder (DA) has been explored. In this paper, we focus on matched scenario training and show that the proposed joint VAE based mapping achieves a significant improvement over DA. Specifically, we observe an absolute improvement of 2.5% in word error rate (WER) compared to DA based enhancement and 3.96% compared to AM trained directly on far-field filterbank features.