通过仅文本和半监督培训来改善审议

论文标题

通过仅文本和半监督培训来改善审议

Improving Deliberation by Text-Only and Semi-Supervised Training

论文作者

Hu, Ke, Sainath, Tara N., He, Yanzhang, Prabhavalkar, Rohit, Strohman, Trevor, Mavandadi, Sepand, Wang, Weiran

论文摘要

由于无标记的文本和语音数据的广泛可用性，最近基于仅音频数据的仅文本和半监督培训已广受欢迎。在这项工作中，我们建议将纯文本和半监督培训纳入基于注意力的审议模型。通过将仅文本数据纳入培训中的双向编码器（BERT）的双向编码器表示，用于审议文本编码器，以及使用联合声学和文本解码器（JATD）和半手驾驶的培训的大规模文本到语音和纯音频，我们实现了4％的-12％WER与基础裁决相比，我们实现了4％的wer量。与最先进的语言模型（LM）纠正方法相比，审议模型将Google语音搜索降低了11％。我们表明，与具有合理的终极潜伏期的最先进的LM委员相比，审议模型还获得了积极的人类并排评估。

Text-only and semi-supervised training based on audio-only data has gained popularity recently due to the wide availability of unlabeled text and speech data. In this work, we propose incorporating text-only and semi-supervised training into an attention-based deliberation model. By incorporating text-only data in training a bidirectional encoder representation from transformer (BERT) for the deliberation text encoder, and large-scale text-to-speech and audio-only utterances using joint acoustic and text decoder (JATD) and semi-supervised training, we achieved 4%-12% WER reduction for various tasks compared to the baseline deliberation. Compared to a state-of-the-art language model (LM) rescoring method, the deliberation model reduces the Google Voice Search WER by 11% relative. We show that the deliberation model also achieves a positive human side-by-side evaluation compared to the state-of-the-art LM rescorer with reasonable endpointer latencies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题