音频深度伪造检测系统，带有神经缝制的添加2022

论文标题

音频深度伪造检测系统，带有神经缝制的添加2022

Audio Deep Fake Detection System with Neural Stitching for ADD 2022

论文作者

Yan, Rui, Wen, Cheng, Zhou, Shuran, Guo, Tingwei, Zou, Wei, Li, Xiangang

论文摘要

本文介绍了我们添加2022的最佳系统和方法：第一个音频深度合成检测挑战\ cite {yi2022ADD}。与类似训练方法的轨道3.2中，两轮评估都使用了相同的系统。第一轮轨道3.2数据是从文本到语音（TTS）或语音转换（VC）算法生成的，而第二轮数据由轨道3.1中其他参与者产生的假音频组成，旨在欺骗我们的系统。我们的系统使用标准的34层重新设备，并具有多头注意集合\ Cite {India2019Self}来学习假音频和欺骗检测的歧视性嵌入。我们进一步利用神经缝制来增强模型的概括能力，以便在不同的任务中表现出色，并且将在以下会议中解释更多细节。该实验表明，我们提出的方法在轨道3.2中的误差率（EER）的所有其他系统都优于所有其他系统。

This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge\cite{Yi2022ADD}. The very same system was used for both two rounds of evaluation in Track 3.2 with a similar training methodology. The first round of Track 3.2 data is generated from Text-to-Speech(TTS) or voice conversion (VC) algorithms, while the second round of data consists of generated fake audio from other participants in Track 3.1, aiming to spoof our systems. Our systems use a standard 34-layer ResNet, with multi-head attention pooling \cite{india2019self} to learn the discriminative embedding for fake audio and spoof detection. We further utilize neural stitching to boost the model's generalization capability in order to perform equally well in different tasks, and more details will be explained in the following sessions. The experiments show that our proposed method outperforms all other systems with a 10.1% equal error rate(EER) in Track 3.2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题