论文标题
OpenStt的端到端ASR探索 - 俄罗斯开放语音到文本数据集
Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text Dataset
论文作者
论文摘要
本文介绍了最大的开源俄罗斯语言数据集的端到端自动语音识别系统(ASR)的探索 - OpenStt。我们评估了不同现有的端到端方法,例如联合CTC/注意力,RNN-TransDucer和Transformer。将所有这些都与基于LF-MMI TDNN-F声学模型的强杂种ASR系统进行比较。对于三个可用验证集(电话,YouTube和书籍),我们最佳的端到端模型分别达到34.8%,19.1%和18.1%的单词错误率(WER)。在相同的条件下,杂种系统显示33.5%,20.9%和18.6%WER。
This paper presents an exploration of end-to-end automatic speech recognition systems (ASR) for the largest open-source Russian language data set -- OpenSTT. We evaluate different existing end-to-end approaches such as joint CTC/Attention, RNN-Transducer, and Transformer. All of them are compared with the strong hybrid ASR system based on LF-MMI TDNN-F acoustic model. For the three available validation sets (phone calls, YouTube, and books), our best end-to-end model achieves word error rate (WER) of 34.8%, 19.1%, and 18.1%, respectively. Under the same conditions, the hybridASR system demonstrates 33.5%, 20.9%, and 18.6% WER.