走向统一的构象结构：从ASR到ASV任务

论文标题

走向统一的构象结构：从ASR到ASV任务

Towards A Unified Conformer Structure: from ASR to ASV Task

论文作者

Liao, Dexin, Jiang, Tao, Wang, Feng, Li, Lin, Hong, Qingyang

论文摘要

由于其强大的自发机制，变形金刚在自然语言处理和计算机视觉任务方面取得了非凡的表现，其变异构象异构体已成为自动语音识别领域（ASR）领域的最新架构。但是，自动扬声器验证（ASV）的主流体系结构是卷积神经网络，并且仍然有很大的研究基于构象异构体的ASV。在本文中，首先，我们将构象体架构从ASR修改为ASV，并进行了较小的更改。采用长度刻度注意力（LSA）方法和清晰度感知的最小化（SAM）来改善模型的概括。在Voxceleb和CN-CELEB上进行的实验表明，与流行的ECAPA-TDNN相比，我们基于构象异构体的ASV实现了竞争性能。其次，受到转移学习策略的启发，ASV构象异构体是自然而然的，可以从验证的ASR模型中初始化。通过参数转移，自我发场机制可以更好地专注于序列特征之间的关系，在Voxceleb和CN-CELEB测试集的EER中，EER的相对改善约为11％，这揭示了构象异构体统一ASV和ASR任务的潜力。最后，我们在ASV-Subtools中提供一个运行时，以评估其生产方案的推理速度。我们的代码在https://github.com/snowdar/asv-subtools/tree/master/master/doc/papers/conformer.md上发布。

Transformer has achieved extraordinary performance in Natural Language Processing and Computer Vision tasks thanks to its powerful self-attention mechanism, and its variant Conformer has become a state-of-the-art architecture in the field of Automatic Speech Recognition (ASR). However, the main-stream architecture for Automatic Speaker Verification (ASV) is convolutional Neural Networks, and there is still much room for research on the Conformer based ASV. In this paper, firstly, we modify the Conformer architecture from ASR to ASV with very minor changes. Length-Scaled Attention (LSA) method and Sharpness-Aware Minimizationis (SAM) are adopted to improve model generalization. Experiments conducted on VoxCeleb and CN-Celeb show that our Conformer based ASV achieves competitive performance compared with the popular ECAPA-TDNN. Secondly, inspired by the transfer learning strategy, ASV Conformer is natural to be initialized from the pretrained ASR model. Via parameter transferring, self-attention mechanism could better focus on the relationship between sequence features, brings about 11% relative improvement in EER on test set of VoxCeleb and CN-Celeb, which reveals the potential of Conformer to unify ASV and ASR task. Finally, we provide a runtime in ASV-Subtools to evaluate its inference speed in production scenario. Our code is released at https://github.com/Snowdar/asv-subtools/tree/master/doc/papers/conformer.md.

下载PDF全文

下载文献需遵守相关版权规定

论文标题