论文标题
动态掩盖以改善口语翻译的稳定性
Dynamic Masking for Improved Stability in Spoken Language Translation
论文作者
论文摘要
对于诸如会议,讲座和会议等实时场景中的口语翻译(SLT),希望尽快向用户展示翻译,避免扬声器和翻译字幕之间的烦人滞后。换句话说,我们想要低延迟,在线SLT。如果我们假设自动语音识别(ASR)和机器翻译(MT)管道,那么在线SLT的可行方法是将在线ASR系统与翻译策略配对,其中MT系统会重新翻译从ASR接收到的所有更新。但是,随着MT系统更新其翻译,这可能会导致烦人的“闪烁”。一个可能的解决方案是在MT系统的输出中添加固定的延迟或“掩码”,但是固定的全局掩码会引入输出的不良延迟。我们展示了如何动态设置此面具,从而在不牺牲翻译质量的情况下改善了延迟闪光的权衡。
For spoken language translation (SLT) in live scenarios such as conferences, lectures and meetings, it is desirable to show the translation to the user as quickly as possible, avoiding an annoying lag between speaker and translated captions. In other words, we would like low-latency, online SLT. If we assume a pipeline of automatic speech recognition (ASR) and machine translation (MT) then a viable approach to online SLT is to pair an online ASR system, with a a retranslation strategy, where the MT system re-translates every update received from ASR. However this can result in annoying "flicker" as the MT system updates its translation. A possible solution is to add a fixed delay, or "mask" to the the output of the MT system, but a fixed global mask introduces undesirable latency to the output. We show how this mask can be set dynamically, improving the latency-flicker trade-off without sacrificing translation quality.