论文标题
ICASSP 2022的MSXF TTS系统添加挑战
The MSXF TTS System for ICASSP 2022 ADD Challenge
论文作者
论文摘要
本文介绍了我们的MSXF TTS系统,用于AUDIO DEEP综合检测(ADD)挑战2022的任务3.1。我们将端到端文本到语音系统使用,并在训练阶段时为系统添加约束损失。端到端TTS系统是VIT,预训练的自我监督模型为WAV2VEC 2.0。我们还探讨了语音速度和音量在欺骗中的影响。越来越多的语音意味着音频中的沉默部分越少,愚弄探测器就越容易。我们还发现较小的音量,更好的欺骗能力,尽管我们将其标准化为提交。我们的团队被确定为C2,我们在挑战中获得了第四名。
This paper presents our MSXF TTS system for Task 3.1 of the Audio Deep Synthesis Detection (ADD) Challenge 2022. We use an end to end text to speech system, and add a constraint loss to the system when training stage. The end to end TTS system is VITS, and the pre-training self-supervised model is wav2vec 2.0. And we also explore the influence of the speech speed and volume in spoofing. The faster speech means the less the silence part in audio, the easier to fool the detector. We also find the smaller the volume, the better spoofing ability, though we normalize volume for submission. Our team is identified as C2, and we got the fourth place in the challenge.