具有独立向量分析的端到端多演讲者ASR

论文标题

具有独立向量分析的端到端多演讲者ASR

End-to-End Multi-speaker ASR with Independent Vector Analysis

论文作者

Scheibler, Robin, Zhang, Wangyou, Chang, Xuankai, Watanabe, Shinji, Qian, Yanmin

论文摘要

我们开发了一个用于多渠道，多演讲者自动语音识别的端到端系统。我们提出了基于独立矢量分析（IVA）范式的联合源分离和取代的前端。它与神经源模型一起使用快速，稳定的迭代源转向算法。来自ASR模块和神经源模型的参数是从ASR损耗本身共同优化的。我们使用神经光束形成前端展示了以前系统的竞争性能。首先，我们在使用各种渠道进行培训和测试时探讨了权衡。其次，我们证明了所提出的IVA前端在嘈杂的数据上表现良好，即使仅接受干净的混合物进行培训。此外，它可以扩展到不重新训练到更多扬声器的分离，这在三个和四个扬声器的混合物中证明了这一点。

We develop an end-to-end system for multi-channel, multi-speaker automatic speech recognition. We propose a frontend for joint source separation and dereverberation based on the independent vector analysis (IVA) paradigm. It uses the fast and stable iterative source steering algorithm together with a neural source model. The parameters from the ASR module and the neural source model are optimized jointly from the ASR loss itself. We demonstrate competitive performance with previous systems using neural beamforming frontends. First, we explore the trade-offs when using various number of channels for training and testing. Second, we demonstrate that the proposed IVA frontend performs well on noisy data, even when trained on clean mixtures only. Furthermore, it extends without retraining to the separation of more speakers, which is demonstrated on mixtures of three and four speakers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题