Strajnet：自动驾驶中占用流场预测的多模式分层变压器

论文标题

Strajnet：自动驾驶中占用流场预测的多模式分层变压器

STrajNet: Multi-modal Hierarchical Transformer for Occupancy Flow Field Prediction in Autonomous Driving

论文作者

Liu, Haochen, Huang, Zhiyu, Lv, Chen

论文摘要

预测未来的周围交通参与者是自动驾驶汽车的关键能力。最近提出的占用流场预测引入了可扩展有效的表示形式，以共同预测场景中周围代理的未来动作。但是，具有挑战性的部分是建模交通代理之间的基本社会互动以及占用与流动之间的关系。因此，本文提出了一种新型的多模式分层变压器网络，该网络融合了矢量（代理运动）和视觉（场景流，地图和占用）方式，并共同预测场景的流动和占用率。具体而言，来自感觉数据的视觉和向量特征通过多阶段变压器模块进行编码，然后是具有时间像素的后期融合变压器模块。重要的是，流动引导的多头自我注意（FG-MSA）模块旨在更好地汇总有关占用和流量的信息，并建模它们之间的数学关系。所提出的方法在Waymo Open Motion数据集上进行了全面验证，并与几个最新模型进行了比较。结果表明，与其他方法相比，我们具有更紧凑的体系结构和数据输入的模型可以实现可比的性能。我们还证明了纳入矢量化的试剂运动特征和提议的FG-MSA模块的有效性。与没有FG-MSA模块的消融模型相比，该模块在2022 Waymo占用和流动预测挑战中赢得了第二名，当前模型显示出更好的流量和占用性以及进一步改善的可分离性。

Forecasting the future states of surrounding traffic participants is a crucial capability for autonomous vehicles. The recently proposed occupancy flow field prediction introduces a scalable and effective representation to jointly predict surrounding agents' future motions in a scene. However, the challenging part is to model the underlying social interactions among traffic agents and the relations between occupancy and flow. Therefore, this paper proposes a novel Multi-modal Hierarchical Transformer network that fuses the vectorized (agent motion) and visual (scene flow, map, and occupancy) modalities and jointly predicts the flow and occupancy of the scene. Specifically, visual and vector features from sensory data are encoded through a multi-stage Transformer module and then a late-fusion Transformer module with temporal pixel-wise attention. Importantly, a flow-guided multi-head self-attention (FG-MSA) module is designed to better aggregate the information on occupancy and flow and model the mathematical relations between them. The proposed method is comprehensively validated on the Waymo Open Motion Dataset and compared against several state-of-the-art models. The results reveal that our model with much more compact architecture and data inputs than other methods can achieve comparable performance. We also demonstrate the effectiveness of incorporating vectorized agent motion features and the proposed FG-MSA module. Compared to the ablated model without the FG-MSA module, which won 2nd place in the 2022 Waymo Occupancy and Flow Prediction Challenge, the current model shows better separability for flow and occupancy and further performance improvements.

下载PDF全文

下载文献需遵守相关版权规定

论文标题