论文标题
使用变压器探索3D点云对象跟踪的点-BEV融合
Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer
论文作者
论文摘要
随着自动驾驶中激光雷达传感器的流行率,3D对象跟踪受到了越来越多的关注。在点云序列中,3D对象跟踪旨在预测给定对象模板中连续帧中对象的位置和方向。在变压器成功的驱动下,我们提出了点跟踪变压器(PTTR),它有效地预测了高质量的3D跟踪,借助变压器操作,以粗到5的方式导致。 PTTR由三个新型设计组成。 1)我们设计的关系吸引抽样代替随机抽样,以在子采样过程中保留与给定模板相关的点。 2)我们提出了一个点关系变压器,以进行有效的特征聚合和模板和搜索区域之间的特征匹配。 3)根据粗糙的跟踪结果,我们采用了一个新颖的预测改进模块,通过局部特征池获得最终的完善预测。此外,我们在捕获对象运动时的鸟眼视图(BEV)的良好特性所激发,我们进一步设计了一个名为PTTR ++的更先进的框架,该框架既包含了点的视图和BEV表示),以利用其互补效应在产生高性追踪结果中。 PTTR ++显着提高了PTTR顶部的跟踪性能,并具有低计算开销。多个数据集的广泛实验表明,我们提出的方法达到了卓越的3D跟踪准确性和效率。
With the prevalence of LiDAR sensors in autonomous driving, 3D object tracking has received increasing attention. In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an object in consecutive frames given an object template. Motivated by the success of transformers, we propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results in a coarse-to-fine manner with the help of transformer operations. PTTR consists of three novel designs. 1) Instead of random sampling, we design Relation-Aware Sampling to preserve relevant points to the given template during subsampling. 2) We propose a Point Relation Transformer for effective feature aggregation and feature matching between the template and search region. 3) Based on the coarse tracking results, we employ a novel Prediction Refinement Module to obtain the final refined prediction through local feature pooling. In addition, motivated by the favorable properties of the Bird's-Eye View (BEV) of point clouds in capturing object motion, we further design a more advanced framework named PTTR++, which incorporates both the point-wise view and BEV representation to exploit their complementary effect in generating high-quality tracking results. PTTR++ substantially boosts the tracking performance on top of PTTR with low computational overhead. Extensive experiments over multiple datasets show that our proposed approaches achieve superior 3D tracking accuracy and efficiency.