通过基于变压器的场景表示学习来增强强化学习，以进行自动驾驶的决策

论文标题

通过基于变压器的场景表示学习来增强强化学习，以进行自动驾驶的决策

Augmenting Reinforcement Learning with Transformer-based Scene Representation Learning for Decision-making of Autonomous Driving

论文作者

Liu, Haochen, Huang, Zhiyu, Mo, Xiaoyu, Lv, Chen

论文摘要

由于交互交通参与者的随机性和道路结构的复杂性，城市自动驾驶的决策是具有挑战性的。尽管基于强化的学习（RL）决策计划有望处理城市驾驶方案，但它的样本效率低和适应性差。在本文中，我们提出了Scene-Rep Transformer，以通过更好的场景表示编码和顺序的预测潜蒸馏来提高RL决策能力。具体而言，构建了多阶段变压器（MST）编码器，不仅对自我车辆及其邻居之间的相互作用意识进行建模，而且对代理商及其候选路线之间的意图意识。具有自我监督学习目标的连续潜伏变压器（SLT）用于将未来的预测信息提炼为潜在的场景表示，以减少勘探空间并加快训练的速度。基于软演员批评的最终决策模块（SAC）将来自场景rep变压器和输出驾驶动作输入的精制场景表示输入。该框架在五个挑战性的模拟城市场景中得到了验证，其性能通过成功率，安全性和效率方面的数据效率和性能的大幅提高而定量地表现出来。定性结果表明，我们的框架能够提取邻居代理人的意图，以帮助做出决策并提供更多多元化的驾驶行为。

Decision-making for urban autonomous driving is challenging due to the stochastic nature of interactive traffic participants and the complexity of road structures. Although reinforcement learning (RL)-based decision-making scheme is promising to handle urban driving scenarios, it suffers from low sample efficiency and poor adaptability. In this paper, we propose Scene-Rep Transformer to improve the RL decision-making capabilities with better scene representation encoding and sequential predictive latent distillation. Specifically, a multi-stage Transformer (MST) encoder is constructed to model not only the interaction awareness between the ego vehicle and its neighbors but also intention awareness between the agents and their candidate routes. A sequential latent Transformer (SLT) with self-supervised learning objectives is employed to distill the future predictive information into the latent scene representation, in order to reduce the exploration space and speed up training. The final decision-making module based on soft actor-critic (SAC) takes as input the refined latent scene representation from the Scene-Rep Transformer and outputs driving actions. The framework is validated in five challenging simulated urban scenarios with dense traffic, and its performance is manifested quantitatively by the substantial improvements in data efficiency and performance in terms of success rate, safety, and efficiency. The qualitative results reveal that our framework is able to extract the intentions of neighbor agents to help make decisions and deliver more diversified driving behaviors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题