视频对象细分的转导方法

论文标题

视频对象细分的转导方法

A Transductive Approach for Video Object Segmentation

论文作者

Zhang, Yizhuo, Wu, Zhirong, Peng, Houwen, Lin, Stephen

论文摘要

半监督视频对象分割旨在将目标对象与视频序列分开，而在第一帧中给定掩码。当前的大多数当前盛行方法都利用了在其他域中训练的其他模块，例如光流和实例分段，因此它们不会与共同基础的其他方法竞争。为了解决这个问题，我们提出了一种简单而强大的跨式转移方法，其中不需要其他模块，数据集和专用的建筑设计。我们的方法采用标签传播方法，该方法基于嵌入空间中的特征相似性向前传递像素标签。与其他繁殖方法不同，我们的传播方法以整体方式扩散时间信息，以说明长期对象外观。此外，我们的方法需要几个其他计算开销，并且以快速的$ \ sim $ 37 fps速度运行。我们带有香草Resnet50骨干的单一型号在戴维斯2017验证套装上的总分为72.3，测试集的总分为63.1。这种简单而高效的高效方法可以用作稳固的基线，从而有助于未来的研究。代码和模型可在\ url {https://github.com/microsoft/transductive-vos.pytorch}中获得。

Semi-supervised video object segmentation aims to separate a target object from a video sequence, given the mask in the first frame. Most of current prevailing methods utilize information from additional modules trained in other domains like optical flow and instance segmentation, and as a result they do not compete with other methods on common ground. To address this issue, we propose a simple yet strong transductive method, in which additional modules, datasets, and dedicated architectural designs are not needed. Our method takes a label propagation approach where pixel labels are passed forward based on feature similarity in an embedding space. Different from other propagation methods, ours diffuses temporal information in a holistic manner which take accounts of long-term object appearance. In addition, our method requires few additional computational overhead, and runs at a fast $\sim$37 fps speed. Our single model with a vanilla ResNet50 backbone achieves an overall score of 72.3 on the DAVIS 2017 validation set and 63.1 on the test set. This simple yet high performing and efficient method can serve as a solid baseline that facilitates future research. Code and models are available at \url{https://github.com/microsoft/transductive-vos.pytorch}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题