semanticvoxel：使用激光点云和语义分割的3D行人检测的顺序融合

论文标题

semanticvoxel：使用激光点云和语义分割的3D行人检测的顺序融合

SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation

论文作者

Fei, Juncong, Chen, Wenbo, Heidenreich, Philipp, Wirges, Sascha, Stiller, Christoph

论文摘要

3D行人检测是自动驾驶中的一项具有挑战性的任务，因为行人相对较小，经常被遮挡，并且很容易与狭窄的垂直物体混淆。 LiDAR和相机是该任务的两个常用传感器方式，应提供互补信息。出乎意料的是，仅激光雷达检测方法倾向于在公共基准中优于多传感器融合方法。最近，通过有效地融合语义分割网络而不是原始图像信息来融合输出，从而消除了点启动以消除这种性能下降。在本文中，我们提出了对尖端的概括，以便能够在不同级别上应用融合。在点云的语义增强之后，我们在支柱中编码原始点数据，以获取体素中的几何特征和语义点数据，以获取语义特征并以有效的方式融合它们。 KITTI测试集的实验结果表明，Semanticvoxels在3D和Bird's Eye View的行人检测基准中都能达到最先进的性能。特别是，我们的方法证明了它在检测挑战性的行人案件方面的力量，并且表现优于当前的最新方法。

3D pedestrian detection is a challenging task in automated driving because pedestrians are relatively small, frequently occluded and easily confused with narrow vertical objects. LiDAR and camera are two commonly used sensor modalities for this task, which should provide complementary information. Unexpectedly, LiDAR-only detection methods tend to outperform multisensor fusion methods in public benchmarks. Recently, PointPainting has been presented to eliminate this performance drop by effectively fusing the output of a semantic segmentation network instead of the raw image information. In this paper, we propose a generalization of PointPainting to be able to apply fusion at different levels. After the semantic augmentation of the point cloud, we encode raw point data in pillars to get geometric features and semantic point data in voxels to get semantic features and fuse them in an effective way. Experimental results on the KITTI test set show that SemanticVoxels achieves state-of-the-art performance in both 3D and bird's eye view pedestrian detection benchmarks. In particular, our approach demonstrates its strength in detecting challenging pedestrian cases and outperforms current state-of-the-art approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题