正常变压器：从LIDAR点提取表面几何形状通过视觉语义增强

论文标题

正常变压器：从LIDAR点提取表面几何形状通过视觉语义增强

Normal Transformer: Extracting Surface Geometry from LiDAR Points Enhanced by Visual Semantics

论文作者

Lin, Ancheng, Li, Jun, Xiang, Yusheng, Bian, Wei, Prasad, Mukesh

论文摘要

高质量的表面正常可以帮助改善自动驾驶汽车面临的问题（例如避免碰撞和遮挡推理）的几何估计。尽管大量文献集中在密集扫描的室内场景上，但由于实际驾驶过程中的自主驾驶期间正常估计仍然是一个复杂的问题，这是由于现实世界中的lidar扫描的稀疏，不均匀和嘈杂的性质。在本文中，我们引入了一种多模式技术，该技术利用了从LiDar和相机传感器获得的3D点云和2D颜色图像进行表面正常估计。我们介绍了混合几何变压器（HGT），这是一种基于变压器的新型神经网络体系结构，可以熟练地融合视觉语义和3D几何信息。此外，我们为多模式数据制定了有效的学习策略。实验结果表明，与现有方法相比，我们的信息融合方法的出色有效性。还验证了所提出的模型可以从模拟交通场景的模拟3D环境中学习。学到的几何知识是可以转移的，可以应用于Kitti数据集中的现实世界3D场景。基于KITTI数据集中估计的正常向量构建的进一步任务表明，所提出的估计器比现有方法具有优势。

High-quality surface normal can help improve geometry estimation in problems faced by autonomous vehicles, such as collision avoidance and occlusion inference. While a considerable volume of literature focuses on densely scanned indoor scenarios, normal estimation during autonomous driving remains an intricate problem due to the sparse, non-uniform, and noisy nature of real-world LiDAR scans. In this paper, we introduce a multi-modal technique that leverages 3D point clouds and 2D colour images obtained from LiDAR and camera sensors for surface normal estimation. We present the Hybrid Geometric Transformer (HGT), a novel transformer-based neural network architecture that proficiently fuses visual semantic and 3D geometric information. Furthermore, we developed an effective learning strategy for the multi-modal data. Experimental results demonstrate the superior effectiveness of our information fusion approach compared to existing methods. It has also been verified that the proposed model can learn from a simulated 3D environment that mimics a traffic scene. The learned geometric knowledge is transferable and can be applied to real-world 3D scenes in the KITTI dataset. Further tasks built upon the estimated normal vectors in the KITTI dataset show that the proposed estimator has an advantage over existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题