论文标题

有效的粗到1个对齐框架 @ ego4d自然语言查询挑战2022

An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022

论文作者

Hou, Zhijian, Zhong, Wanjun, Ji, Lei, Gao, Difei, Yan, Kun, Chan, Wing-Kwong, Ngo, Chong-Wah, Shou, Zheng, Duan, Nan

论文摘要

该技术报告描述了ECCV 2022中EGO4D自然语言查询(NLQ)挑战的锥方法。我们利用模型锥,这是一个有效的以窗户为中心的窗户的粗到1个对准框架。具体而言,锥体通过滑动窗口方法将长视频将长视频切成候选窗口。 Cone(1)以窗口为中心,通过对比度学习和加快与NL查询相关的候选窗户的过滤来学习窗口(粗粒)语义差异,并且(2)执行内部内部(细粒度的)候选阶段,利用强大的多模式模型的对比度模型进行排名,以预先效果进行对比模型。在盲测,分别为r1@iou = 0.3和r1@iou=0.5实现15.26和9.24。

This technical report describes the CONE approach for Ego4D Natural Language Queries (NLQ) Challenge in ECCV 2022. We leverage our model CONE, an efficient window-centric COarse-to-fiNE alignment framework. Specifically, CONE dynamically slices the long video into candidate windows via a sliding window approach. Centering at windows, CONE (1) learns the inter-window (coarse-grained) semantic variance through contrastive learning and speeds up inference by pre-filtering the candidate windows relevant to the NL query, and (2) conducts intra-window (fine-grained) candidate moments ranking utilizing the powerful multi-modal alignment ability of the contrastive vision-text pre-trained model EgoVLP. On the blind test set, CONE achieves 15.26 and 9.24 for R1@IoU=0.3 and R1@IoU=0.5, respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源