DSGN ++：利用基于立体声的3D检测器的视觉空间关系

论文标题

DSGN ++：利用基于立体声的3D检测器的视觉空间关系

DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

论文作者

Chen, Yilun, Huang, Shijia, Liu, Shu, Yu, Bei, Jia, Jiaya

论文摘要

基于摄像机的3D对象探测器由于其更广泛的部署而欢迎其比LIDAR传感器较低。我们首先重新审视先前的立体声检测器DSGN的立体音量构建方式，用于表示3D几何和语义。我们抛光立体声建模并提出了高级版本DSGN ++，旨在在三个主要方面增强整个2d到3D管道的有效信息流。首先，为了有效地将2D信息提升到立体声音量，我们提出了深度平面清扫（DPS），以允许较密集的连接并提取深度引导的特征。其次，为了掌握不同间距的功能，我们提出了一个新颖的立体声音量 - 双视立体声音量（DSV），该音调量（DSV）集成了前视图和顶部视图功能，并重建了相机frustum中的子素深度。第三，随着前景区域在3D空间中的占主导地位，我们提出了一种多模式数据编辑策略-Stereo-lidar拷贝性 - 可确保跨模式对齐并提高数据效率。没有铃铛和哨子，在流行的Kitti基准测试中的各种模式设置中进行了广泛的实验表明，我们的方法始终优于所有类别的基于相机的3D检测器。代码可从https://github.com/chenyilun95/dsgn2获得。

Camera-based 3D object detectors are welcome due to their wider deployment and lower price than LiDAR sensors. We first revisit the prior stereo detector DSGN for its stereo volume construction ways for representing both 3D geometry and semantics. We polish the stereo modeling and propose the advanced version, DSGN++, aiming to enhance effective information flow throughout the 2D-to-3D pipeline in three main aspects. First, to effectively lift the 2D information to stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser connections and extracts depth-guided features. Second, for grasping differently spaced features, we present a novel stereo volume -- Dual-view Stereo Volume (DSV) that integrates front-view and top-view features and reconstructs sub-voxel depth in the camera frustum. Third, as the foreground region becomes less dominant in 3D space, we propose a multi-modal data editing strategy -- Stereo-LiDAR Copy-Paste, which ensures cross-modal alignment and improves data efficiency. Without bells and whistles, extensive experiments in various modality setups on the popular KITTI benchmark show that our method consistently outperforms other camera-based 3D detectors for all categories. Code is available at https://github.com/chenyilun95/DSGN2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题