有效采样的最大熵逆增强学习，并应用于自动驾驶

论文标题

有效采样的最大熵逆增强学习，并应用于自动驾驶

Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning with Application to Autonomous Driving

论文作者

Wu, Zheng, Sun, Liting, Zhan, Wei, Yang, Chenyu, Tomizuka, Masayoshi

论文摘要

在过去的几十年中，我们目睹了自动驾驶领域的重大进展。基于优化和加强学习（RL）的先进技术在解决远期问题方面变得越来越强大：鉴于设计的奖励/成本功能，我们应该如何优化它们并获得安全有效地与环境相互作用的驾驶政策。这样的进展提出了另一个同样重要的问题：\ emph {我们应该优化什么}？希望我们可以提取人类驾驶员尝试从真实的流量数据中优化的东西，而不是手动指定奖励功能，并将其分配给自动驾驶汽车，以使人类与智能代理之间的更自然和透明的互动。为了解决这个问题，我们提出了本文中有效的基于采样的最大收入逆增强学习（IRL）算法。与现有的IRL算法不同，通过引入有效的连续域轨迹采样器，所提出的算法可以直接学习连续域中的奖励功能，同时考虑人类驱动器的展示轨迹中的不确定性。我们在实际驾驶数据（包括非相互作用和交互式场景）上评估了提出的算法。实验结果表明，与其他基线IRL算法相比，提出的算法以更快的收敛速度和更好的概括来实现更准确的预测性能。

In the past decades, we have witnessed significant progress in the domain of autonomous driving. Advanced techniques based on optimization and reinforcement learning (RL) become increasingly powerful at solving the forward problem: given designed reward/cost functions, how should we optimize them and obtain driving policies that interact with the environment safely and efficiently. Such progress has raised another equally important question: \emph{what should we optimize}? Instead of manually specifying the reward functions, it is desired that we can extract what human drivers try to optimize from real traffic data and assign that to autonomous vehicles to enable more naturalistic and transparent interaction between humans and intelligent agents. To address this issue, we present an efficient sampling-based maximum-entropy inverse reinforcement learning (IRL) algorithm in this paper. Different from existing IRL algorithms, by introducing an efficient continuous-domain trajectory sampler, the proposed algorithm can directly learn the reward functions in the continuous domain while considering the uncertainties in demonstrated trajectories from human drivers. We evaluate the proposed algorithm on real driving data, including both non-interactive and interactive scenarios. The experimental results show that the proposed algorithm achieves more accurate prediction performance with faster convergence speed and better generalization compared to other baseline IRL algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题