学会识别代码切换的语音而不忘记单语音识别

论文标题

学会识别代码切换的语音而不忘记单语音识别

Learning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition

论文作者

Shah, Sanket, Abraham, Basil, M, Gurunath Reddy, Sitaram, Sunayana, Joshi, Vikas

论文摘要

最近，在自动语音识别（ASR）的语音中取得了重大进展，导致许多语言对的代码切换数据集的准确性提高。用一种或两种语言混合使用的代码开关语音与单语言的共同出现。在这项工作中，我们表明，在代码开关的语音上，微调的ASR模型会危害单语音的性能。我们指出需要优化用于代码开关的模型，同时还可以确保不牺牲单语性性能。单语模型可以在数千小时的语音上进行培训，这可能无法重新训练新模型。当我们只能访问单语言模型并且没有经过培训的数据时，我们建议使用学习而无需忘记（LWF）框架进行代码切换的ASR。我们表明，可以使用此框架在代码切换和单语测试集上表现良好的框架训练模型。如果我们也可以访问单语言培训数据，我们就提出了用于微调模型的正则化策略，用于不牺牲单语言准确性。与使用汇总数据和简单的微调的基线相比，我们报告单语言和代码开关测试集的单词错误率（WER）的提高。

Recently, there has been significant progress made in Automatic Speech Recognition (ASR) of code-switched speech, leading to gains in accuracy on code-switched datasets in many language pairs. Code-switched speech co-occurs with monolingual speech in one or both languages being mixed. In this work, we show that fine-tuning ASR models on code-switched speech harms performance on monolingual speech. We point out the need to optimize models for code-switching while also ensuring that monolingual performance is not sacrificed. Monolingual models may be trained on thousands of hours of speech which may not be available for re-training a new model. We propose using the Learning Without Forgetting (LWF) framework for code-switched ASR when we only have access to a monolingual model and do not have the data it was trained on. We show that it is possible to train models using this framework that perform well on both code-switched and monolingual test sets. In cases where we have access to monolingual training data as well, we propose regularization strategies for fine-tuning models for code-switching without sacrificing monolingual accuracy. We report improvements in Word Error Rate (WER) in monolingual and code-switched test sets compared to baselines that use pooled data and simple fine-tuning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题