论文标题

对LRS2数据集的重叠语音的视听识别

Audio-visual Recognition of Overlapped speech for the LRS2 dataset

论文作者

Yu, Jianwei, Zhang, Shi-Xiong, Wu, Jian, Ghorbani, Shahram, Wu, Bo, Kang, Shiyin, Liu, Shansong, Liu, Xunying, Meng, Helen, Yu, Dong

论文摘要

迄今为止,自动识别演讲重叠仍然是一项高度挑战的任务。本文由人类言语感知的双峰性质的动机,研究了视听技术对重叠的语音识别的使用。解决了与视听语音识别(AVSR)系统的构建有关的三个问题。首先,研究了基本的体系结构设计,即AVSR系统的端到端和混合。其次,有目的设计的模态融合门用于稳健地集成音频和视觉特征。第三,与传统的管道结构相反,该体系结构包含明确的语音分离和识别组件,也提出了一种使用无晶格MMI(LF-MMI)判别标准始终如一地优化的简化和集成的AVSR系统。拟议的LF-MMI时间延迟​​神经网络(TDNN)系统为LRS2数据集建立了最先进的信息。从LRS2数据集模拟的重叠语音的实验表明,提出的AVSR系统的表现仅超过音频LF-MMI DNN系统,最多可低29.98 \%\%的单词错误率(WER)降低,并且产生的识别性能可与更复杂的管道系统相提并论。还获得了使用特征融合的基线AVSR系统的4.89 \%绝对性能的一致性提高。

Automatic recognition of overlapped speech remains a highly challenging task to date. Motivated by the bimodal nature of human speech perception, this paper investigates the use of audio-visual technologies for overlapped speech recognition. Three issues associated with the construction of audio-visual speech recognition (AVSR) systems are addressed. First, the basic architecture designs i.e. end-to-end and hybrid of AVSR systems are investigated. Second, purposefully designed modality fusion gates are used to robustly integrate the audio and visual features. Third, in contrast to a traditional pipelined architecture containing explicit speech separation and recognition components, a streamlined and integrated AVSR system optimized consistently using the lattice-free MMI (LF-MMI) discriminative criterion is also proposed. The proposed LF-MMI time-delay neural network (TDNN) system establishes the state-of-the-art for the LRS2 dataset. Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29.98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system. Consistent performance improvements of 4.89\% absolute in WER reduction over the baseline AVSR system using feature fusion are also obtained.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源