42是面向字幕的语音翻译中所有内容的答案吗？

论文标题

42是面向字幕的语音翻译中所有内容的答案吗？

Is 42 the Answer to Everything in Subtitling-oriented Speech Translation?

论文作者

Karakanta, Alina, Negri, Matteo, Turchi, Marco

论文摘要

鉴于每天都有大量的视听内容，因此对于传播信息的传播越来越重要。尽管神经机器翻译（NMT）可以加快翻译视听内容的过程，但要转录源语言以及将文本和将文本分割成适当的字幕仍需要大量的手动努力。在时间和细分方面创建适当的字幕高度取决于音频中存在的信息（话语持续时间，自然暂停）。在这项工作中，我们探讨了将语音翻译（ST）应用于字幕的两种方法：a）直接端到端和b）经典的级联方法。我们讨论了访问源语言语音的好处，以提高生成的字幕与空间和时间字幕限制的一致性，并表明长度并不是针对副标题的ST的所有内容。

Subtitling is becoming increasingly important for disseminating information, given the enormous amounts of audiovisual content becoming available daily. Although Neural Machine Translation (NMT) can speed up the process of translating audiovisual content, large manual effort is still required for transcribing the source language, and for spotting and segmenting the text into proper subtitles. Creating proper subtitles in terms of timing and segmentation highly depends on information present in the audio (utterance duration, natural pauses). In this work, we explore two methods for applying Speech Translation (ST) to subtitling: a) a direct end-to-end and b) a classical cascade approach. We discuss the benefit of having access to the source language speech for improving the conformity of the generated subtitles to the spatial and temporal subtitling constraints and show that length is not the answer to everything in the case of subtitling-oriented ST.

下载PDF全文

下载文献需遵守相关版权规定

论文标题