逻辑访问语音演示攻击检测的实用指南

论文标题

逻辑访问语音演示攻击检测的实用指南

A Practical Guide to Logical Access Voice Presentation Attack Detection

论文作者

Wang, Xin, Yamagishi, Junichi

论文摘要

具有自动扬声器验证（ASV）组件的基于语音的人机接口是市场中常用的。但是，演示攻击的威胁也在增长，因为攻击者可以使用最近的语音合成技术来产生受害者的自然声音。因此，针对ASV或语音反欺骗的呈现攻击检测是必不可少的。自2010年代初以来，对语音垫的研究取得了重大进展，包括PAD模型，基准数据集和评估活动的进步。本章介绍了语音垫领域的实用指南，重点是使用文本到语音和语音转换算法和基于文物检测的对策进行对策的逻辑访问攻击。它介绍了语音垫的基本概念，解释了通用技术，并使用基准数据集上的最新方法提供了一项实验研究。实验代码是开源的。

Voice-based human-machine interfaces with an automatic speaker verification (ASV) component are commonly used in the market. However, the threat from presentation attacks is also growing since attackers can use recent speech synthesis technology to produce a natural-sounding voice of a victim. Presentation attack detection (PAD) for ASV, or speech anti-spoofing, is therefore indispensable. Research on voice PAD has seen significant progress since the early 2010s, including the advancement in PAD models, benchmark datasets, and evaluation campaigns. This chapter presents a practical guide to the field of voice PAD, with a focus on logical access attacks using text-to-speech and voice conversion algorithms and spoofing countermeasures based on artifact detection. It introduces the basic concept of voice PAD, explains the common techniques, and provides an experimental study using recent methods on a benchmark dataset. Code for the experiments is open-sourced.

下载PDF全文

下载文献需遵守相关版权规定

论文标题