论文标题
双语端到端ASR具有字节级子词
Bilingual End-to-End ASR with Byte-Level Subwords
论文作者
论文摘要
在本文中,我们研究了端到端神经网络的输出表示如何影响多语言自动语音识别(ASR)。我们研究不同的表示,包括字符级,字节级,字节对编码(BPE)和字节级字节对编码(BBPE)表示,并分析其优势和劣势。我们专注于开发单一端到端模型来支持基于话语的双语ASR,在单一语音中,演讲者不会在两种语言之间交替,而是可能会改变语言的语言。我们对英语和普通话任务进行实验,我们发现具有惩罚方案的BBPE可以将基于话语的双语ASR绩效提高2%至5%,即使输出数量较少,参数较少。我们以分析结论,该分析指示了进一步改善多语言ASR的方向。
In this paper, we investigate how the output representation of an end-to-end neural network affects multilingual automatic speech recognition (ASR). We study different representations including character-level, byte-level, byte pair encoding (BPE), and byte-level byte pair encoding (BBPE) representations, and analyze their strengths and weaknesses. We focus on developing a single end-to-end model to support utterance-based bilingual ASR, where speakers do not alternate between two languages in a single utterance but may change languages across utterances. We conduct our experiments on English and Mandarin dictation tasks, and we find that BBPE with penalty schemes can improve utterance-based bilingual ASR performance by 2% to 5% relative even with smaller number of outputs and fewer parameters. We conclude with analysis that indicates directions for further improving multilingual ASR.