论文标题
具有预训练模型的有效语音翻译
Efficient Speech Translation with Pre-trained Models
论文作者
论文摘要
当构建最新的语音翻译模型时,由于较大的培训数据规模和复杂的模型,对大量计算资源的需求是一个重要的障碍。预训练模型的可用性是有效地建立强大语音翻译系统的有前途的机会。第一步,我们研究了基于预训练的模型建立级联和端到端语音翻译系统的有效策略。使用此策略,我们可以在单个GPU上训练并应用模型。虽然端到端模型表现出与级联的翻译性能相比,但该技术的应用限制了需要额外的端到端培训数据的需求。在第二步中,我们提出了额外的相似性损失,以鼓励模型为语音和笔录生成相似的隐藏表示形式。使用此技术,我们可以在有限的端到端培训数据的情况下提高数据效率并提高翻译质量的6个BLEU点。
When building state-of-the-art speech translation models, the need for large computational resources is a significant obstacle due to the large training data size and complex models. The availability of pre-trained models is a promising opportunity to build strong speech translation systems efficiently. In a first step, we investigate efficient strategies to build cascaded and end-to-end speech translation systems based on pre-trained models. Using this strategy, we can train and apply the models on a single GPU. While the end-to-end models show superior translation performance to cascaded ones, the application of this technology has a limitation on the need for additional end-to-end training data. In a second step, we proposed an additional similarity loss to encourage the model to generate similar hidden representations for speech and transcript. Using this technique, we can increase the data efficiency and improve the translation quality by 6 BLEU points in scenarios with limited end-to-end training data.