Corenet：从单个RGB图像重建相干3D场景

论文标题

Corenet：从单个RGB图像重建相干3D场景

CoReNet: Coherent 3D scene reconstruction from a single RGB image

论文作者

Popov, Stefan, Bauszat, Pablo, Ferrari, Vittorio

论文摘要

深度学习技术的进步使最近的工作能够重建仅给出一个RBG图像作为输入的单个对象的形状。在此任务的共同编码器架构的基础上，我们提出了三个扩展：（1）射线跟踪的跳过连接，以物理上正确的方式将局部2D信息传播到输出3D卷；（2）一个混合3D卷表示，可实现构建翻译模型，同时编码精美的对象详细信息而没有过多的内存足迹；（3）量身定制的重建损失，以捕获整体对象几何形状。此外，我们调整了模型，以解决从单个图像重建多个对象的更艰巨任务。我们在一个通道中共同重建所有对象，产生连贯的重建，其中所有对象都生活在一个一致的3D坐标框架中相对于相机，并且它们不会在3D空间中相交。我们还通过在3D体积中幻觉幻觉来处理闭塞并解决它们。我们通过实验验证了我们的贡献对Shapenet的合成数据以及Pix3d的真实图像的影响。我们的方法改进了两个数据集上最新的单对象方法。最后，我们在多个对象重建上进行定量评估绩效，并使用从塑料对象组装的合成场景进行定量评估。

Advances in deep learning techniques have allowed recent work to reconstruct the shape of a single object given only one RBG image as input. Building on common encoder-decoder architectures for this task, we propose three extensions: (1) ray-traced skip connections that propagate local 2D information to the output 3D volume in a physically correct manner; (2) a hybrid 3D volume representation that enables building translation equivariant models, while at the same time encoding fine object details without an excessive memory footprint; (3) a reconstruction loss tailored to capture overall object geometry. Furthermore, we adapt our model to address the harder task of reconstructing multiple objects from a single image. We reconstruct all objects jointly in one pass, producing a coherent reconstruction, where all objects live in a single consistent 3D coordinate frame relative to the camera and they do not intersect in 3D space. We also handle occlusions and resolve them by hallucinating the missing object parts in the 3D volume. We validate the impact of our contributions experimentally both on synthetic data from ShapeNet as well as real images from Pix3D. Our method improves over the state-of-the-art single-object methods on both datasets. Finally, we evaluate performance quantitatively on multiple object reconstruction with synthetic scenes assembled from ShapeNet objects.

下载PDF全文

下载文献需遵守相关版权规定

论文标题