论文标题

提高自动语音识别转录的可读性

Improving Readability for Automatic Speech Recognition Transcription

论文作者

Liao, Junwei, Eskimez, Sefik Emre, Lu, Liyang, Shi, Yu, Gong, Ming, Shou, Linjun, Qu, Hong, Zeng, Michael

论文摘要

现代自动语音识别(ASR)系统可以在识别准确性方面实现高性能。但是,由于语法错误,流行性和其他语言通信中常见的杂音,完全准确的成绩单仍可能具有挑战性。许多下游任务和人类读者都依赖ASR系统的输出。因此,说话者和ASR系统引入的错误将传播到管道中的下一个任务。在这项工作中,我们提出了一项新的NLP任务,称为ASR后处理(APR),旨在将嘈杂的ASR输出转变为可读的人类和下游任务的可读文本,同时保持扬声器的语义含义。此外,我们通过使用收集的用于语法误差校正(GEC)的数据集合成APR任务的示例来描述一种解决特定于任务数据的方法,然后是文本到语音(TTS)和ASR。此外,我们建议从类似任务借用的指标来评估APR任务上的绩效。我们使用传统的管道方法比较了基于几个开源和改编的预训练模型的微调模型。我们的结果表明,固定模型可显着提高APR任务的性能,这暗示了使用APR系统的潜在好处。我们希望阅读,理解和重写我们工作的方法可以作为许多NLP任务和人类读者可以从中受益的基础。

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline. In this work, we propose a novel NLP task called ASR post-processing for readability (APR) that aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker. In addition, we describe a method to address the lack of task-specific data by synthesizing examples for the APR task using the datasets collected for Grammatical Error Correction (GEC) followed by text-to-speech (TTS) and ASR. Furthermore, we propose metrics borrowed from similar tasks to evaluate performance on the APR task. We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method. Our results suggest that finetuned models improve the performance on the APR task significantly, hinting at the potential benefits of using APR systems. We hope that the read, understand, and rewrite approach of our work can serve as a basis that many NLP tasks and human readers can benefit from.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源