单语识别器融合代码开关语音识别

论文标题

单语识别器融合代码开关语音识别

Monolingual Recognizers Fusion for Code-switching Speech Recognition

论文作者

Song, Tongtong, Xu, Qiang, Lu, Haoyu, Wang, Longbiao, Shi, Hao, Lin, Yuqin, Yang, Yanbing, Dang, Jianwu

论文摘要

双重编码器结构已在代码转换（CS）自动语音识别（ASR）中进行了深入研究。但是，大多数现有方法都需要两个单语ASR模型（MAM）的结构，仅使用MAM的编码器。这导致了一个问题，即预先训练的MAM不能及时且完全用于CS ASR。在本文中，我们提出了针对CS ASR的单语识别物融合方法。它有两个阶段：语音意识（SA）阶段和语言融合（LF）阶段。在SA阶段，声学特征由两个独立的MAM映射到两个特定于语言的预测。为了使妈妈专注于自己的语言，我们进一步扩展了妈妈的语言感知培训策略。在LF阶段，BELM融合了两个特定语言的预测以获得最终预测。此外，我们提出了一种文本仿真策略，以简化BELM的训练过程并减少对CS数据的依赖。普通话 - 英语语料库的实验显示了该方法的效率。使用开源预训练的MAM后，在测试集上的混合错误率大大降低。

The bi-encoder structure has been intensively investigated in code-switching (CS) automatic speech recognition (ASR). However, most existing methods require the structures of two monolingual ASR models (MAMs) should be the same and only use the encoder of MAMs. This leads to the problem that pre-trained MAMs cannot be timely and fully used for CS ASR. In this paper, we propose a monolingual recognizers fusion method for CS ASR. It has two stages: the speech awareness (SA) stage and the language fusion (LF) stage. In the SA stage, acoustic features are mapped to two language-specific predictions by two independent MAMs. To keep the MAMs focused on their own language, we further extend the language-aware training strategy for the MAMs. In the LF stage, the BELM fuses two language-specific predictions to get the final prediction. Moreover, we propose a text simulation strategy to simplify the training process of the BELM and reduce reliance on CS data. Experiments on a Mandarin-English corpus show the efficiency of the proposed method. The mix error rate is significantly reduced on the test set after using open-source pre-trained MAMs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题