论文标题
自动语音识别任务的性别领域适应
Gender domain adaptation for automatic speech recognition task
论文作者
论文摘要
本文的重点是针对给定性别的说话者适应目标的声学模型的填充。我们在Librispeech-960上仔了变压器基线模型,并在性别特定的测试子集上进行了鉴定,并进行了实验。通常,我们不会通过这种方法来减少基本降低。如果未冻结编码器和解码器中的层,我们在男性子集上达到了高达约5%的单词错误率,而女子子集的单词错误率最高为3%,但从最后的检查点开始调整。此外,我们将基础模型改编在完整的L2北极数据集上,并为特定的演讲者以及男性和女性分别对其进行了微调。与在整个L2北极数据集中调整的模型相比,在性别子集上训练的模型的精度提高了1-2%。最后,我们测试了常见的X矢量语音嵌入和传统编码器的嵌入的串联,但其准确性的增益并不显着。
This paper is focused on the finetuning of acoustic models for speaker adaptation goals on a given gender. We pretrained the Transformer baseline model on Librispeech-960 and conduct experiments with finetuning on the gender-specific test subsets and. In general, we do not obtain essential WER reduction by finetuning techniques by this approach. We achieved up to ~5% lower word error rate on the male subset and 3% on the female subset if the layers in the encoder and decoder are not frozen, but the tuning is started from the last checkpoints. Moreover, we adapted our base model on the full L2 Arctic dataset of accented speech and fine-tuned it for particular speakers and male and female genders separately. The models trained on the gender subsets obtained 1-2% higher accuracy when compared to the model tuned on the whole L2 Arctic dataset. Finally, we tested the concatenation of the pretrained x-vector voice embeddings and embeddings from a conventional encoder, but its gain in accuracy is not significant.