UAVM：统一音频和视觉模型

论文标题

UAVM：统一音频和视觉模型

UAVM: Towards Unifying Audio and Visual Models

论文作者

Gong, Yuan, Liu, Alexander H., Rouditchenko, Andrew, Glass, James

论文摘要

传统的视听模型具有独立的音频和视频分支。在这项工作中，我们通过设计统一的视听模型（UAVM）来统一音频和视觉分支。 UAVM在VGGSOUND上实现了65.8％的新最先进的视听事件分类精度。更有趣的是，我们还发现了与模式无关的同类产品所没有的一些引人入胜的属性。

Conventional audio-visual models have independent audio and video branches. In this work, we unify the audio and visual branches by designing a Unified Audio-Visual Model (UAVM). The UAVM achieves a new state-of-the-art audio-visual event classification accuracy of 65.8% on VGGSound. More interestingly, we also find a few intriguing properties of UAVM that the modality-independent counterparts do not have.

下载PDF全文

下载文献需遵守相关版权规定

论文标题