通过2D监督学习3D场景先验

论文标题

通过2D监督学习3D场景先验

Learning 3D Scene Priors with 2D Supervision

论文作者

Nie, Yinyu, Dai, Angela, Han, Xiaoguang, Nießner, Matthias

论文摘要

整体3D场景理解需要在3D环境中对布局配置和对象几何形状的估计。最近的作品表明，通过利用3D监督（例如3D边界框或CAD模型），各种输入方式（例如，图像，3D扫描）的3D场景估算的进展，其规模昂贵且通常是棘手的。为了解决这一缺点，我们提出了一种新的方法来学习布局和形状的3D场景先验，而无需任何3D地面真相。取而代之的是，我们依靠多视图RGB图像的2D监督。我们的方法代表一个3D场景作为潜在向量，我们可以从中逐步将其解码为以其类别类别，3D边界框和网格为特征的对象的顺序。通过我们训练有素的自回归解码器代表场景，我们的方法促进了许多下游应用程序，包括场景综合，插值和单视图重建。 3D前和扫描仪上的实验表明，我们的方法在单视图重建中优于最新技术，并实现最新的最新方法，从而在现场综合基准中与需要3D监督的基线有关。

Holistic 3D scene understanding entails estimation of both layout configuration and object geometry in a 3D environment. Recent works have shown advances in 3D scene estimation from various input modalities (e.g., images, 3D scans), by leveraging 3D supervision (e.g., 3D bounding boxes or CAD models), for which collection at scale is expensive and often intractable. To address this shortcoming, we propose a new method to learn 3D scene priors of layout and shape without requiring any 3D ground truth. Instead, we rely on 2D supervision from multi-view RGB images. Our method represents a 3D scene as a latent vector, from which we can progressively decode to a sequence of objects characterized by their class categories, 3D bounding boxes, and meshes. With our trained autoregressive decoder representing the scene prior, our method facilitates many downstream applications, including scene synthesis, interpolation, and single-view reconstruction. Experiments on 3D-FRONT and ScanNet show that our method outperforms state of the art in single-view reconstruction, and achieves state-of-the-art results in scene synthesis against baselines which require for 3D supervision.

下载PDF全文

下载文献需遵守相关版权规定

论文标题