论文标题

通过分层应急计划在随机,部分观察的环境中更安全的自动驾驶

Safer Autonomous Driving in a Stochastic, Partially-Observable Environment by Hierarchical Contingency Planning

论文作者

Lecerf, Ugo, Yemdji-Tchassi, Christelle, Michiardi, Pietro

论文摘要

当学习在随机,部分可观察到的环境中采取行动时,智能代理人应该准备预期其对环境状态的信念发生变化,并能够在不断变化的条件下调整其行动。作为人类,我们能够在学习任务时以明确的目的来纠正初始控制中的错误时能够制定应急计划,因此,如果我们对环境的看法突然发生了变化,则需要立即采取纠正措施。尤其是,自动驾驶汽车(AVS)在安全性至关重要的现实情况下导航,并且确实需要对不断变化的环境信念做出反应的强大能力。 在本文中,我们探讨了一种从训练到执行的端到端方法,以学习强大的应急计划,并将它们与分层规划师结合在一起,以在自主导航任务中获得强大的代理策略,而其他车辆的行为是未知的,并且代理商对这些行为的信念是突然的,最后一项cececond的变化。我们表明,我们的方法在部分可观察到的随机环境中导致了稳健,安全的行为,从而使训练过程中未见的环境动态概括了。

When learning to act in a stochastic, partially observable environment, an intelligent agent should be prepared to anticipate a change in its belief of the environment state, and be capable of adapting its actions on-the-fly to changing conditions. As humans, we are able to form contingency plans when learning a task with the explicit aim of being able to correct errors in the initial control, and hence prove useful if ever there is a sudden change in our perception of the environment which requires immediate corrective action. This is especially the case for autonomous vehicles (AVs) navigating real-world situations where safety is paramount, and a strong ability to react to a changing belief about the environment is truly needed. In this paper we explore an end-to-end approach, from training to execution, for learning robust contingency plans and combining them with a hierarchical planner to obtain a robust agent policy in an autonomous navigation task where other vehicles' behaviours are unknown, and the agent's belief about these behaviours is subject to sudden, last-second change. We show that our approach results in robust, safe behaviour in a partially observable, stochastic environment, generalizing well over environment dynamics not seen during training.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源