论文标题
静音语音的数字声音
Digital Voicing of Silent Speech
论文作者
论文摘要
在本文中,我们考虑了以数字方式表达无声语音的任务,其中默默地嘴巴的单词基于肌肉肌电图(EMG)传感器测量来转换为捕获肌肉冲动的传感器。虽然先前的工作重点是在发声语音期间收集的EMG的培训语音综合模型,但我们是在默默表达的语音期间收集的EMG训练的人。我们通过将音频目标从声音转移到无声信号来介绍一种关于无声EMG的培训方法。与仅使用发声数据训练的基线相比,我们的方法大大提高了从静音EMG产生的音频的清晰度,在一个数据条件下将转录单词错误率从64%降低到4%,另一个数据中的转录单词错误率从另一个数据条件下降到88%到68%。为了刺激这项任务的进一步发展,我们分享了我们的新数据集,以实现无声和发声的面部EMG测量。
In this paper, we consider the task of digitally voicing silent speech, where silently mouthed words are converted to audible speech based on electromyography (EMG) sensor measurements that capture muscle impulses. While prior work has focused on training speech synthesis models from EMG collected during vocalized speech, we are the first to train from EMG collected during silently articulated speech. We introduce a method of training on silent EMG by transferring audio targets from vocalized to silent signals. Our method greatly improves intelligibility of audio generated from silent EMG compared to a baseline that only trains with vocalized data, decreasing transcription word error rate from 64% to 4% in one data condition and 88% to 68% in another. To spur further development on this task, we share our new dataset of silent and vocalized facial EMG measurements.