论文标题

推动半监督学习的限制以自动语音识别

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

论文作者

Zhang, Yu, Qin, James, Park, Daniel S., Han, Wei, Chiu, Chung-Cheng, Pang, Ruoming, Le, Quoc V., Wu, Yonghui

论文摘要

我们利用Libri-Light数据集的未标记的音频来获得半监督学习中最新的发展的最新发展,以获得自动语音识别的最新结果。更确切地说,我们使用使用WAV2VEC 2.0预训练的巨型构象模型进行了嘈杂的学生培训,并使用巨型构象模型进行了训练。通过这样做,我们能够在LibrisPeech测试/测试中获得1.4%/2.6%的单词率速率(WERS),而目前是最新的1.7%/3.3%。

We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-training. By doing so, we are able to achieve word-error-rates (WERs) 1.4%/2.6% on the LibriSpeech test/test-other sets against the current state-of-the-art WERs 1.7%/3.3%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源