论文标题
多演讲者TTS的域 - 反面培训
Domain-adversarial training of multi-speaker TTS
论文作者
论文摘要
多演讲者TTS必须学习语言嵌入和文本嵌入,以在所需的语音中生成所需的语言内容的语音。但是,尚不清楚说话者的哪些语音特征以及语言内容的哪一部分。在本文中,使用梯度逆转层与我们介绍的辅助扬声器分类器的梯度逆转层被迫嵌入依赖说话者的特征。我们使用角度余量软磁性损失训练扬声器分类器。在主观评估中,可以表明,对单语多扬声器TTS嵌入文本嵌入的对抗性训练可在相似性MOS上提高39.9%,自然性MOS提高40.1%。
Multi-speaker TTS has to learn both linguistic embedding and text embedding to generate speech of desired linguistic content in desired voice. However, it is unclear which characteristic of speech results from speaker and which part from linguistic content. In this paper, text embedding is forced to unlearn speaker dependent characteristic using gradient reversal layer to auxiliary speaker classifier that we introduce. We train a speaker classifier using angular margin softmax loss. In subjective evaluation, it is shown that the adversarial training of text embedding for unilingual multi-speaker TTS results in 39.9% improvement on similarity MOS and 40.1% improvement on naturalness MOS.