论文标题
UAVM:统一音频和视觉模型
UAVM: Towards Unifying Audio and Visual Models
论文作者
论文摘要
传统的视听模型具有独立的音频和视频分支。在这项工作中,我们通过设计统一的视听模型(UAVM)来统一音频和视觉分支。 UAVM在VGGSOUND上实现了65.8%的新最先进的视听事件分类精度。更有趣的是,我们还发现了与模式无关的同类产品所没有的一些引人入胜的属性。
Conventional audio-visual models have independent audio and video branches. In this work, we unify the audio and visual branches by designing a Unified Audio-Visual Model (UAVM). The UAVM achieves a new state-of-the-art audio-visual event classification accuracy of 65.8% on VGGSound. More interestingly, we also find a few intriguing properties of UAVM that the modality-independent counterparts do not have.