论文标题

头像提交给EGO4D AV转录挑战

AVATAR submission to the Ego4D AV Transcription Challenge

论文作者

Seo, Paul Hongsuck, Nagrani, Arsha, Schmid, Cordelia

论文摘要

在本报告中,我们描述了我们对2022年EGO4D视听(AV)语音转录挑战的提交。我们的管道基于Avatar,Avatar是AV-ASR的最先进的编码器模型,该模型对频谱图和RGB图像进行了早期融合。我们描述数据集,实验设置和消融。我们的最终方法在挑战测试集上达到了68.40的成绩,表现优于基线43.7%,并赢得了挑战。

In this report, we describe our submission to the Ego4D AudioVisual (AV) Speech Transcription Challenge 2022. Our pipeline is based on AVATAR, a state of the art encoder-decoder model for AV-ASR that performs early fusion of spectrograms and RGB images. We describe the datasets, experimental settings and ablations. Our final method achieves a WER of 68.40 on the challenge test set, outperforming the baseline by 43.7%, and winning the challenge.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源