使用语音增强和注意模型的强大演讲者识别

论文标题

使用语音增强和注意模型的强大演讲者识别

Robust Speaker Recognition Using Speech Enhancement And Attention Model

论文作者

Shi, Yanpei, Huang, Qiang, Hain, Thomas

论文摘要

在本文中，通过级联演讲增强和演讲者处理提出了一种用于演讲者识别的新颖架构。它的目的是在语音信号被噪声破坏时提高说话者的识别性能。这两个模块不是单独处理语音增强和说话者识别的，而是通过使用深神网络的联合优化将两个模块集成到一个框架中。此外，为了增加针对噪声的鲁棒性，采用多阶段注意机制来突出从时间和频域中的上下文信息中学到的扬声器相关特征。为了评估所提出方法的说话者识别和验证性能，我们在Voxceleb1的数据集上进行对其进行测试，Voxceleb1是大多数使用的基准数据集之一。此外，当以不同的信噪比（SNR）级别以三种类型的干扰，一般的噪音，音乐和bable损坏时，我们提出的方法的鲁棒性也会在Voxceleb1数据上进行测试。获得的结果表明，在我们的实验中，在大多数声学条件下，使用语音增强和多阶段注意模型的拟议方法优于两个强大的基准。

In this paper, a novel architecture for speaker recognition is proposed by cascading speech enhancement and speaker processing. Its aim is to improve speaker recognition performance when speech signals are corrupted by noise. Instead of individually processing speech enhancement and speaker recognition, the two modules are integrated into one framework by a joint optimisation using deep neural networks. Furthermore, to increase robustness against noise, a multi-stage attention mechanism is employed to highlight the speaker related features learned from context information in time and frequency domain. To evaluate speaker identification and verification performance of the proposed approach, we test it on the dataset of VoxCeleb1, one of mostly used benchmark datasets. Moreover, the robustness of our proposed approach is also tested on VoxCeleb1 data when being corrupted by three types of interferences, general noise, music, and babble, at different signal-to-noise ratio (SNR) levels. The obtained results show that the proposed approach using speech enhancement and multi-stage attention models outperforms two strong baselines not using them in most acoustic conditions in our experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题