中心功能融合：基于中心对象的选择性多传感器融合

论文标题

中心功能融合：基于中心对象的选择性多传感器融合

Center Feature Fusion: Selective Multi-Sensor Fusion of Center-based Objects

论文作者

Jacobson, Philip, Zhou, Yiyang, Zhan, Wei, Tomizuka, Masayoshi, Wu, Ming C.

论文摘要

利用多模式融合，尤其是在相机和激光镜头之间，对于为自动驾驶汽车构建准确且健壮的3D对象检测系统至关重要。直到最近，点装饰方法（在该点云中都有相机功能增强，一直是该领域的主要方法。但是，这些方法无法利用来自相机的较高分辨率图像。还提出了最近将摄像机功能投射到鸟类视图（BEV）融合空间的工作，但是它们需要预计数百万像素，其中大多数仅包含背景信息。在这项工作中，我们提出了一种新颖的方法中心功能融合（CFF），其中我们利用相机和激光雷达中心的基于中心的检测网络来识别相关对象位置。然后，我们使用基于中心的检测来识别与对象位置相关的像素功能的位置，这是图像中总数的一小部分。然后将它们投影并融合在BEV框架中。在Nuscenes数据集上，我们的表现优于仅限激光雷达基线的4.9％地图，同时比其他融合方法融合了100倍。

Leveraging multi-modal fusion, especially between camera and LiDAR, has become essential for building accurate and robust 3D object detection systems for autonomous vehicles. Until recently, point decorating approaches, in which point clouds are augmented with camera features, have been the dominant approach in the field. However, these approaches fail to utilize the higher resolution images from cameras. Recent works projecting camera features to the bird's-eye-view (BEV) space for fusion have also been proposed, however they require projecting millions of pixels, most of which only contain background information. In this work, we propose a novel approach Center Feature Fusion (CFF), in which we leverage center-based detection networks in both the camera and LiDAR streams to identify relevant object locations. We then use the center-based detection to identify the locations of pixel features relevant to object locations, a small fraction of the total number in the image. These are then projected and fused in the BEV frame. On the nuScenes dataset, we outperform the LiDAR-only baseline by 4.9% mAP while fusing up to 100x fewer features than other fusion methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题