查看更多，了解更多：无监督的视频对象与共同注意的暹罗网络进行分割

论文标题

查看更多，了解更多：无监督的视频对象与共同注意的暹罗网络进行分割

See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks

论文作者

Lu, Xiankai, Wang, Wenguan, Ma, Chao, Shen, Jianbing, Shao, Ling, Porikli, Fatih

论文摘要

我们介绍了一个新颖的网络，称为共同注意暹罗网络（COSNET），以从整体视图中解决无监督的视频对象细分任务。我们强调了视频框架之间固有相关性的重要性，并结合了一种全球共同注意机制，以进一步改善最先进的基于深度学习的解决方案，这些解决方案主要集中于学习歧视性前景表示，而在短期时间段内外观和运动。我们网络中的共同注意力层通过共同计算和附加共同注意响应到共同的特征空间中，为捕获全球相关性和场景上下文提供了有效而有能力的阶段。我们使用成对的视频帧训练COSNET，它们自然会增强培训数据并增加学习能力。在分割阶段，共同发育模型通过将多个参考框架一起处理来编码有用的信息，该框架可以更好地推断出经常出现和显着的前景对象。我们提出了一个统一和端到端的可训练框架，可以在其中得出不同的共同注意变体，以在视频中挖掘丰富的上下文。我们对三个大型基准的广泛实验表明，COSNET的表现优于当前替代方案。

We introduce a novel network, called CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from a holistic view. We emphasize the importance of inherent correlation among video frames and incorporate a global co-attention mechanism to improve further the state-of-the-art deep learning based solutions that primarily focus on learning discriminative foreground representations over appearance and motion in short-term temporal segments. The co-attention layers in our network provide efficient and competent stages for capturing global correlations and scene context by jointly computing and appending co-attention responses into a joint feature space. We train COSNet with pairs of video frames, which naturally augments training data and allows increased learning capacity. During the segmentation stage, the co-attention model encodes useful information by processing multiple reference frames together, which is leveraged to infer the frequently reappearing and salient foreground objects better. We propose a unified and end-to-end trainable framework where different co-attention variants can be derived for mining the rich context within videos. Our extensive experiments over three large benchmarks manifest that COSNet outperforms the current alternatives by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题