论文标题

使用多任务学习和扬声器分类器联合培训的跨语性文本到语音

Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training

论文作者

Yang, J., He, Lei

论文摘要

在跨语性的语音综合中,可以用各种语言的语言为单胶条扬声器综合。通常,只有单胶说话者的数据可用于模型培训,因此,在合成的跨语言语音和本地语言记录之间,说话者的相似性相对较低。基于多语言变压器文本到语音模型,本文研究了一个多任务学习框架,以提高跨语性扬声器的相似性。为了进一步提高说话者的相似性,提出了与说话者分类器的联合培训。在这里,提出了类似于平行计划的采样的方案,以有效地训练变压器模型,以避免在引入关节训练时打破并行的训练机制。通过使用多任务学习和演讲者分类器联合培训,在主观和客观评估中,在培训集中,可见的和看不见的扬声器都可以一致地提高跨语言扬声器的相似性。

In cross-lingual speech synthesis, the speech in various languages can be synthesized for a monoglot speaker. Normally, only the data of monoglot speakers are available for model training, thus the speaker similarity is relatively low between the synthesized cross-lingual speech and the native language recordings. Based on the multilingual transformer text-to-speech model, this paper studies a multi-task learning framework to improve the cross-lingual speaker similarity. To further improve the speaker similarity, joint training with a speaker classifier is proposed. Here, a scheme similar to parallel scheduled sampling is proposed to train the transformer model efficiently to avoid breaking the parallel training mechanism when introducing joint training. By using multi-task learning and speaker classifier joint training, in subjective and objective evaluations, the cross-lingual speaker similarity can be consistently improved for both the seen and unseen speakers in the training set.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源