头像提交给EGO4D AV转录挑战

论文标题

头像提交给EGO4D AV转录挑战

AVATAR submission to the Ego4D AV Transcription Challenge

论文作者

Seo, Paul Hongsuck, Nagrani, Arsha, Schmid, Cordelia

论文摘要

在本报告中，我们描述了我们对2022年EGO4D视听（AV）语音转录挑战的提交。我们的管道基于Avatar，Avatar是AV-ASR的最先进的编码器模型，该模型对频谱图和RGB图像进行了早期融合。我们描述数据集，实验设置和消融。我们的最终方法在挑战测试集上达到了68.40的成绩，表现优于基线43.7％，并赢得了挑战。

In this report, we describe our submission to the Ego4D AudioVisual (AV) Speech Transcription Challenge 2022. Our pipeline is based on AVATAR, a state of the art encoder-decoder model for AV-ASR that performs early fusion of spectrograms and RGB images. We describe the datasets, experimental settings and ablations. Our final method achieves a WER of 68.40 on the challenge test set, outperforming the baseline by 43.7%, and winning the challenge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题