论文标题
语音翻译和端到端的承诺:盘点我们所在的地方
Speech Translation and the End-to-End Promise: Taking Stock of Where We Are
论文作者
论文摘要
在其三个十年的历史中,语音翻译在其主要研究主题中经历了几次转变。从松散耦合到语音识别和机器翻译的级联,到探索紧密耦合的问题,最后到最近引起了很多关注的端到端模型。本文简要介绍了这些发展,并讨论了传统方法的主要挑战,这些方法源于言语识别器的中间表示,以及分别针对不同目标的级联模型。 最近的端到端建模技术有望通过允许所有模型组件的联合培训并消除对明确的中间表示的必要性来克服这些问题的原则方法。但是,仔细观察表明,由于解决数据稀缺性的妥协,许多端到端模型都无法解决这些问题。本文提供了一个统一的分类和命名法,涵盖了传统方法和最新方法,这可以通过突出折衷和开放研究问题来帮助研究人员。
Over its three decade history, speech translation has experienced several shifts in its primary research themes; moving from loosely coupled cascades of speech recognition and machine translation, to exploring questions of tight coupling, and finally to end-to-end models that have recently attracted much attention. This paper provides a brief survey of these developments, along with a discussion of the main challenges of traditional approaches which stem from committing to intermediate representations from the speech recognizer, and from training cascaded models separately towards different objectives. Recent end-to-end modeling techniques promise a principled way of overcoming these issues by allowing joint training of all model components and removing the need for explicit intermediate representations. However, a closer look reveals that many end-to-end models fall short of solving these issues, due to compromises made to address data scarcity. This paper provides a unifying categorization and nomenclature that covers both traditional and recent approaches and that may help researchers by highlighting both trade-offs and open research questions.