论文标题
使用WAV2VEC 2.0检测口吃治疗中的功能障碍
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0
论文作者
论文摘要
口吃是一种多种言语障碍,会损害个人的沟通能力。口吃(PWS)的人经常使用语音疗法来应对自己的状况。改善具有这种非典型语音或跟踪语音疗法有效性的人的语音识别系统将需要能够检测功能障碍的系统,同时也能够检测到治疗中获得的语音技术。本文表明,用于对含有口吃的语音的较大的英语语料库进行口吃的微调wav2Vec 2.0 [1],结合多任务学习,增强了通用Purpose Wav2VEC 2.0特征的有效性,以检测语音中的停滞;内部和跨语言。我们通过培训支持向量机分类器来评估我们对流利银行[2]的方法[2]和以德国治疗为中心的Kassel Cassel状态[3]数据集[3]数据集,该数据集使用六种不同结合的事件类型的固定模型中提取的功能:块,延长,延长,延长,延长,重复重复,言语重复,插入,以及 - 插入,以及 - 特定的插入和 - 特定于疗法,以及 - 特定于疗法。使用来自微调模型的嵌入式嵌入会导致相对分类的性能增长高达27%W.R.T. F1得分。
Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech techniques acquired in therapy. This paper shows that fine-tuning wav2vec 2.0 [1] for the classification of stuttering on a sizeable English corpus containing stuttered speech, in conjunction with multi-task learning, boosts the effectiveness of the general-purpose wav2vec 2.0 features for detecting stuttering in speech; both within and across languages. We evaluate our method on FluencyBank , [2] and the German therapy-centric Kassel State of Fluency (KSoF) [3] dataset by training Support Vector Machine classifiers using features extracted from the finetuned models for six different stuttering-related event types: blocks, prolongations, sound repetitions, word repetitions, interjections, and - specific to therapy - speech modifications. Using embeddings from the fine-tuned models leads to relative classification performance gains up to 27% w.r.t. F1-score.