使用多任务学习和欺骗类型分类的多任务学习的神经MOS预测用于合成语音的综合语音预测

论文标题

使用多任务学习和欺骗类型分类的多任务学习的神经MOS预测用于合成语音的综合语音预测

Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning With Spoofing Detection and Spoofing Type Classification

论文作者

Choi, Yeunju, Jung, Youngmoon, Kim, Hoirin

论文摘要

几项研究提出了基于深度学习的模型，以预测合成语音的平均意见评分（MOS），显示了更换人类评估者的可能性。但是，MOSS的评估者间变异性和评估者的变异使得很难确保模型的高性能。在本文中，我们提出了一种多任务学习（MTL）方法，以使用以下两个辅助任务来提高MOS预测模型的性能：欺骗检测（SD）和欺骗类型分类（STC）。此外，我们使用焦点损失来最大程度地提高SD和STC之间的协同作用以进行MOS预测。使用语音转换挑战挑战的MOS评估结果的实验表明，提出了两个辅助任务的MTL改善了MOS预测。我们提出的模型在基线模型中的性能相对相对提高了11.6％。

Several studies have proposed deep-learning-based models to predict the mean opinion score (MOS) of synthesized speech, showing the possibility of replacing human raters. However, inter- and intra-rater variability in MOSs makes it hard to ensure the high performance of the models. In this paper, we propose a multi-task learning (MTL) method to improve the performance of a MOS prediction model using the following two auxiliary tasks: spoofing detection (SD) and spoofing type classification (STC). Besides, we use the focal loss to maximize the synergy between SD and STC for MOS prediction. Experiments using the MOS evaluation results of the Voice Conversion Challenge 2018 show that proposed MTL with two auxiliary tasks improves MOS prediction. Our proposed model achieves up to 11.6% relative improvement in performance over the baseline model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题