论文标题

部分可观测时空混沌系统的无模型预测

InstanceFormer: An Online Video Instance Segmentation Framework

论文作者

Koner, Rajat, Hannan, Tanveer, Shit, Suprosanna, Sharifzadeh, Sahand, Schubert, Matthias, Seidl, Thomas, Tresp, Volker

论文摘要

最近的基于变压器的离线视频实例细分(VIS)方法取得了令人鼓舞的结果,并明显胜过在线方法。但是,它们对整个视频的依赖以及由全时空注意力引起的巨大计算复杂性限制了它们在真实的应用中,例如处理冗长的视频。在本文中,我们提出了一个基于单级变压器的高效在线VIS框架InstanceFormer,它特别适合长期挑战性的视频。我们提出了三个新的组成部分,以建模短期和长期依赖性和时间连贯性。首先,我们传播了对短期更改建模的先前实例的表示形式,位置和语义信息。其次,我们在解码器中提出了一种新颖的记忆交叉注意,该记忆使网络可以在某个时间窗口内研究早期实例。最后,我们采用时间对比度损失,在所有框架的实例表示中施加连贯性。记忆注意力和时间连贯性对远程依赖建模特别有益,包括诸如遮挡等挑战的情况。所提出的实例形式在多个数据集中大幅度的边距优于先前的在线基准方法。最重要的是,InstanceFormer超过了挑战性和长期数据集(例如YouTube-Vis-2021和OVIS)的离线方法。代码可从https://github.com/rajatkoner08/instanceformer获得。

Recent transformer-based offline video instance segmentation (VIS) approaches achieve encouraging results and significantly outperform online approaches. However, their reliance on the whole video and the immense computational complexity caused by full Spatio-temporal attention limit them in real-life applications such as processing lengthy videos. In this paper, we propose a single-stage transformer-based efficient online VIS framework named InstanceFormer, which is especially suitable for long and challenging videos. We propose three novel components to model short-term and long-term dependency and temporal coherence. First, we propagate the representation, location, and semantic information of prior instances to model short-term changes. Second, we propose a novel memory cross-attention in the decoder, which allows the network to look into earlier instances within a certain temporal window. Finally, we employ a temporal contrastive loss to impose coherence in the representation of an instance across all frames. Memory attention and temporal coherence are particularly beneficial to long-range dependency modeling, including challenging scenarios like occlusion. The proposed InstanceFormer outperforms previous online benchmark methods by a large margin across multiple datasets. Most importantly, InstanceFormer surpasses offline approaches for challenging and long datasets such as YouTube-VIS-2021 and OVIS. Code is available at https://github.com/rajatkoner08/InstanceFormer.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源