利用特征多样性用于化妆时间视频接地

论文标题

利用特征多样性用于化妆时间视频接地

Exploiting Feature Diversity for Make-up Temporal Video Grounding

论文作者

Shu, Xiujun, Wen, Wei, Guo, Taian, He, Sunan, Wu, Chen, Qiao, Ruizhi

论文摘要

该技术报告介绍了MTVG的第三次获胜解决方案，MTVG是在上下文（PIC）挑战中在ACM MM 2022中引入的一项新任务。MTVG旨在根据文本描述将步骤的时间边界定位在未修剪视频中。这项任务的最大挑战是化妆步骤的fi ne Grean Video-Text语义。但是，当前方法主要使用基于动作的预训练模型提取视频功能。由于动作比化妆步骤更粗糙，因此基于动作的特征不足以提供提示。为了解决这个问题，我们建议通过利用特征多样性来实现Fi ne Grean的表示。具体而言，我们提出了一系列从特征提取，网络优化到模型集合的方法。结果，我们在MTVG比赛中获得了第三名。

This technical report presents the 3rd winning solution for MTVG, a new task introduced in the 4-th Person in Context (PIC) Challenge at ACM MM 2022. MTVG aims at localizing the temporal boundary of the step in an untrimmed video based on a textual description. The biggest challenge of this task is the fi ne-grained video-text semantics of make-up steps. However, current methods mainly extract video features using action-based pre-trained models. As actions are more coarse-grained than make-up steps, action-based features are not sufficient to provide fi ne-grained cues. To address this issue,we propose to achieve fi ne-grained representation via exploiting feature diversities. Specifically, we proposed a series of methods from feature extraction, network optimization, to model ensemble. As a result, we achieved 3rd place in the MTVG competition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题