自动语音识别任务的性别领域适应

论文标题

自动语音识别任务的性别领域适应

Gender domain adaptation for automatic speech recognition task

论文作者

Artem, Sokolov, Savchenko, Andrey V.

论文摘要

本文的重点是针对给定性别的说话者适应目标的声学模型的填充。我们在Librispeech-960上仔了变压器基线模型，并在性别特定的测试子集上进行了鉴定，并进行了实验。通常，我们不会通过这种方法来减少基本降低。如果未冻结编码器和解码器中的层，我们在男性子集上达到了高达约5％的单词错误率，而女子子集的单词错误率最高为3％，但从最后的检查点开始调整。此外，我们将基础模型改编在完整的L2北极数据集上，并为特定的演讲者以及男性和女性分别对其进行了微调。与在整个L2北极数据集中调整的模型相比，在性别子集上训练的模型的精度提高了1-2％。最后，我们测试了常见的X矢量语音嵌入和传统编码器的嵌入的串联，但其准确性的增益并不显着。

This paper is focused on the finetuning of acoustic models for speaker adaptation goals on a given gender. We pretrained the Transformer baseline model on Librispeech-960 and conduct experiments with finetuning on the gender-specific test subsets and. In general, we do not obtain essential WER reduction by finetuning techniques by this approach. We achieved up to ~5% lower word error rate on the male subset and 3% on the female subset if the layers in the encoder and decoder are not frozen, but the tuning is started from the last checkpoints. Moreover, we adapted our base model on the full L2 Arctic dataset of accented speech and fine-tuned it for particular speakers and male and female genders separately. The models trained on the gender subsets obtained 1-2% higher accuracy when compared to the model tuned on the whole L2 Arctic dataset. Finally, we tested the concatenation of the pretrained x-vector voice embeddings and embeddings from a conventional encoder, but its gain in accuracy is not significant.

下载PDF全文

下载文献需遵守相关版权规定

论文标题