论文标题

通过硬对向对比度学习探索视频的视觉原告表示视频指示

Probing Visual-Audio Representation for Video Highlight Detection via Hard-Pairs Guided Contrastive Learning

论文作者

Li, Shuaicheng, Zhang, Feng, Yang, Kunlin, Liu, Lingbo, Liu, Shinan, Hou, Jun, Yi, Shuai

论文摘要

视频突出显示检测是一个至关重要但充满挑战的问题,旨在确定未修剪视频中有趣的时刻。此任务的关键在于有效的视频表示形式,共同追求两个目标,即\ textit {i.e。},跨模式表示学习和细粒度的特征歧视。在本文中,这两个挑战不仅通过丰富表示建模的模式内部和跨模式关系来应对,而且还以歧视性的方式塑造了特征。我们提出的方法主要利用模式内的编码和交叉模式共发生编码来完全表示建模。具体而言,编码模式内模式的增强了模态特征,并通过音频和视觉信号中的模式关系学习来抑制无关的模态。同时,跨模式共同发生编码着重于同时模式间关系,并选择性地捕获了多模式之间的有效信息。从本地上下文中抽象的全局信息进一步增强了多模式表示。此外,我们使用硬对对比度学习(HPCL)方案来扩大嵌入特征的歧视能力。进一步采用了硬对采样策略来开采硬样品,以改善HPCL中的特征歧视。与其他最新方法相比,在两个基准上进行的广泛实验证明了我们提出的方法的有效性和优势。

Video highlight detection is a crucial yet challenging problem that aims to identify the interesting moments in untrimmed videos. The key to this task lies in effective video representations that jointly pursue two goals, \textit{i.e.}, cross-modal representation learning and fine-grained feature discrimination. In this paper, these two challenges are tackled by not only enriching intra-modality and cross-modality relations for representation modeling but also shaping the features in a discriminative manner. Our proposed method mainly leverages the intra-modality encoding and cross-modality co-occurrence encoding for fully representation modeling. Specifically, intra-modality encoding augments the modality-wise features and dampens irrelevant modality via within-modality relation learning in both audio and visual signals. Meanwhile, cross-modality co-occurrence encoding focuses on the co-occurrence inter-modality relations and selectively captures effective information among multi-modality. The multi-modal representation is further enhanced by the global information abstracted from the local context. In addition, we enlarge the discriminative power of feature embedding with a hard-pairs guided contrastive learning (HPCL) scheme. A hard-pairs sampling strategy is further employed to mine the hard samples for improving feature discrimination in HPCL. Extensive experiments conducted on two benchmarks demonstrate the effectiveness and superiority of our proposed methods compared to other state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源