3D中无监督的以对象为中心的视频生成和分解

论文标题

3D中无监督的以对象为中心的视频生成和分解

Unsupervised object-centric video generation and decomposition in 3D

论文作者

Henderson, Paul, Lampert, Christoph H.

论文摘要

自然的视频建模方法是将它们表示为移动对象的组成。最近的作品模型在缓慢变化的背景上的一组2D精灵，但没有考虑引起它们的基础3D场景。相反，我们建议在具有多个3D对象和3D背景的场景中移动的场景时，将视频建模为视图。我们的模型是从单眼视频中训练的，而无需任何监督，但学会生成包含多个移动对象的连贯的3D场景。我们对两个数据集进行了详细的实验，超出了最新生成方法支持的视觉复杂性。我们评估了有关深度预测和3D对象检测的方法 - 这些较早作品无法解决的任务 - 并在2D实例细分和跟踪中表现出表现优于表现。

A natural approach to generative modeling of videos is to represent them as a composition of moving objects. Recent works model a set of 2D sprites over a slowly-varying background, but without considering the underlying 3D scene that gives rise to them. We instead propose to model a video as the view seen while moving through a scene with multiple 3D objects and a 3D background. Our model is trained from monocular videos without any supervision, yet learns to generate coherent 3D scenes containing several moving objects. We conduct detailed experiments on two datasets, going beyond the visual complexity supported by state-of-the-art generative approaches. We evaluate our method on depth-prediction and 3D object detection -- tasks which cannot be addressed by those earlier works -- and show it out-performs them even on 2D instance segmentation and tracking.

下载PDF全文

下载文献需遵守相关版权规定

论文标题