单层：来自单个图像的Amodal场景布局

论文标题

单层：来自单个图像的Amodal场景布局

MonoLayout: Amodal scene layout from a single image

论文作者

Mani, Kaustubh, Daga, Swapnil, Garg, Shubhika, Shankar, N. Sai, Jatavallabhula, Krishna Murthy, Krishna, K. Madhava

论文摘要

在本文中，我们解决了估计复杂城市驾驶场景布局的新颖，充满挑战的问题。鉴于从驾驶平台捕获的单色图像，我们旨在预测道路和其他交通参与者的鸟瞰图布局。估计的布局应推理超出图像中可见的内容，并补偿由于投影而导致的3D信息的丢失。我们将这个问题的Amodal场景布局估计配对，其中涉及图像中被阻塞的世界各地的“幻觉”场景布局。为此，我们介绍了Monolayout，这是一个深层神经网络，用于实时的Amodal场景布局从单个图像进行估算。我们将场景布局表示为多渠道语义占用网格，并利用对抗性功能学习来幻觉可掩盖的图像部分的合理完成。由于缺乏公平的基线方法，我们扩展了几种最先进的方法，用于鸟瞰图的道路估算和车辆占用估算，以进行严格的评估。通过利用时间传感器融合来产生培训标签，我们在许多数据集上的表现明显优于当前艺术。在Kitti和Argoverse数据集上，我们的表现要优于所有基线。我们还将进行所有注释，并公开代码。本文的视频摘要可用https://www.youtube.com/watch?v=hcrogyo6yrq。

In this paper, we address the novel, highly challenging problem of estimating the layout of a complex urban driving scenario. Given a single color image captured from a driving platform, we aim to predict the bird's-eye view layout of the road and other traffic participants. The estimated layout should reason beyond what is visible in the image, and compensate for the loss of 3D information due to projection. We dub this problem amodal scene layout estimation, which involves "hallucinating" scene layout for even parts of the world that are occluded in the image. To this end, we present MonoLayout, a deep neural network for real-time amodal scene layout estimation from a single image. We represent scene layout as a multi-channel semantic occupancy grid, and leverage adversarial feature learning to hallucinate plausible completions for occluded image parts. Due to the lack of fair baseline methods, we extend several state-of-the-art approaches for road-layout estimation and vehicle occupancy estimation in bird's-eye view to the amodal setup for rigorous evaluation. By leveraging temporal sensor fusion to generate training labels, we significantly outperform current art over a number of datasets. On the KITTI and Argoverse datasets, we outperform all baselines by a significant margin. We also make all our annotations, and code publicly available. A video abstract of this paper is available https://www.youtube.com/watch?v=HcroGyo6yRQ .

下载PDF全文

下载文献需遵守相关版权规定

论文标题