论文标题
ASR误差校正在操作预测上的约束解码
ASR Error Correction with Constrained Decoding on Operation Prediction
论文作者
论文摘要
错误校正技术仍然有效地从自动语音识别(ASR)模型中提高输出。现有的端到端误差校正方法基于编码器架构架构过程,在解码阶段中所有令牌都会产生不良的延迟。在本文中,我们提出了一种利用校正操作的预测的ASR误差校正方法。更具体地说,我们在编码器和解码器之间构造一个预测因子,以了解是否应保留一个令牌(“ k”),已删除(“ d”)或更改(“ C”)以将解码限制在输入序列嵌入式的一部分(“ C”代币)以进行快速推断。三个公共数据集的实验证明了拟议方法在减少ASR校正中解码过程的潜伏期中的有效性。与固体编码器基线相比,我们提出的两个模型的推理速度至少提高了3次(3.4次和5.7次),同时保持相同的准确度(分别降低0.53%和1.69%)。同时,我们生产并发布了一个为ASR错误校正社区促进该行研究的基准数据集。
Error correction techniques remain effective to refine outputs from automatic speech recognition (ASR) models. Existing end-to-end error correction methods based on an encoder-decoder architecture process all tokens in the decoding phase, creating undesirable latency. In this paper, we propose an ASR error correction method utilizing the predictions of correction operations. More specifically, we construct a predictor between the encoder and the decoder to learn if a token should be kept ("K"), deleted ("D"), or changed ("C") to restrict decoding to only part of the input sequence embeddings (the "C" tokens) for fast inference. Experiments on three public datasets demonstrate the effectiveness of the proposed approach in reducing the latency of the decoding process in ASR correction. It enhances the inference speed by at least three times (3.4 and 5.7 times) while maintaining the same level of accuracy (with WER reductions of 0.53% and 1.69% respectively) for our two proposed models compared to a solid encoder-decoder baseline. In the meantime, we produce and release a benchmark dataset contributing to the ASR error correction community to foster research along this line.