无监督的视频对象通过原型内存网络进行分割

论文标题

无监督的视频对象通过原型内存网络进行分割

Unsupervised Video Object Segmentation via Prototype Memory Network

论文作者

Lee, Minhyeok, Cho, Suhwan, Lee, Seunghoon, Park, Chaewon, Lee, Sangyoun

论文摘要

无监督的视频对象细分旨在将视频中的目标对象细分为初始框架中没有地面真相掩码。这项具有挑战性的任务需要为视频序列中最显着的常见对象提取功能。可以通过使用运动信息（例如光流）来解决这个困难，但是仅使用相邻帧之间的信息会导致遥远帧与性能差的连通性差。为了解决这个问题，我们提出了一种新颖的原型内存网络体系结构。提出的模型通过从输入RGB图像和光流图中提取基于超类的组件原型来有效提取RGB和运动信息。此外，该模型基于自学习算法在每个帧中的组件原型评分，并自适应地存储了最有用的原型，并放弃了过时的原型。我们使用内存库中的原型来预测下一个查询帧掩模，从而增强了远处框架之间的关联以帮助进行准确的掩码预测。我们的方法在三个数据集上进行评估，从而实现最先进的性能。我们通过各种消融研究证明了所提出的模型的有效性。

Unsupervised video object segmentation aims to segment a target object in the video without a ground truth mask in the initial frame. This challenging task requires extracting features for the most salient common objects within a video sequence. This difficulty can be solved by using motion information such as optical flow, but using only the information between adjacent frames results in poor connectivity between distant frames and poor performance. To solve this problem, we propose a novel prototype memory network architecture. The proposed model effectively extracts the RGB and motion information by extracting superpixel-based component prototypes from the input RGB images and optical flow maps. In addition, the model scores the usefulness of the component prototypes in each frame based on a self-learning algorithm and adaptively stores the most useful prototypes in memory and discards obsolete prototypes. We use the prototypes in the memory bank to predict the next query frames mask, which enhances the association between distant frames to help with accurate mask prediction. Our method is evaluated on three datasets, achieving state-of-the-art performance. We prove the effectiveness of the proposed model with various ablation studies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题