无畏：在强大的端到端语音识别中结合自我监督的学习功能的精致损失

论文标题

无畏：在强大的端到端语音识别中结合自我监督的学习功能的精致损失

FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition

论文作者

Chen, Szu-Jui, Xie, Jiamin, Hansen, John H. L.

论文摘要

自我监督的学习表示（SSLR）为许多领域的下游任务带来了强大的功能。最近，几个SSLR在自动语音识别（ASR）基准语料库上显示出令人鼓舞的结果。但是，先前的研究仅显示单独SSLR作为ASR模型的输入特征的性能。在这项研究中，我们建议使用端到端ASR模型中的各种融合方法研究不同SSLR组合的有效性。此外，我们将显示这些提取的SSLR之间存在相关性。因此，我们进一步提出了去相关的特征细化损失，以有效地结合了输入特征集。为了进行评估，我们表明，提议的“无所畏惧的学习功能”的性能要比系统表现更好，而没有提议的WSJ和无畏步骤挑战（FSC）Corpora提出的功能完善损失。

Self-supervised learning representations (SSLR) have resulted in robust features for downstream tasks in many fields. Recently, several SSLRs have shown promising results on automatic speech recognition (ASR) benchmark corpora. However, previous studies have only shown performance for solitary SSLRs as an input feature for ASR models. In this study, we propose to investigate the effectiveness of diverse SSLR combinations using various fusion methods within end-to-end (E2E) ASR models. In addition, we will show there are correlations between these extracted SSLRs. As such, we further propose a feature refinement loss for decorrelation to efficiently combine the set of input features. For evaluation, we show that the proposed 'FeaRLESS learning features' perform better than systems without the proposed feature refinement loss for both the WSJ and Fearless Steps Challenge (FSC) corpora.

下载PDF全文

下载文献需遵守相关版权规定

论文标题