奖励设计用于使用多代理强化学习的驾驶员重新定位

论文标题

奖励设计用于使用多代理强化学习的驾驶员重新定位

Reward Design for Driver Repositioning Using Multi-Agent Reinforcement Learning

论文作者

Shou, Zhenyu, Di, Xuan

论文摘要

据报道，很大一部分乘客请求是未经戒心的，部分原因是在乘客寻求过程中空缺的雇员驾驶行为。本文旨在通过平均野外多机构增强学习（MARL）方法对多驱动器重新定位任务进行建模，该方法捕捉了多个代理之间的竞争。由于在给定的奖励机制下将MARL直接应用于多驱动器系统，由于驾驶员的自私，可能会产生次优的平衡，因此本研究提出了一种奖励设计方案，可以通过该方案达到更为所需的平衡。为了有效解决双层优化问题，将奖励设计和较低级别作为多代理系统，采用贝叶斯优化（BO）算法来加快学习过程。然后，我们将双层优化模型应用于两个案例研究，即，在服务费下的电子出租驾驶员重新定位，以及在纽约市交通拥堵定价下重新定位的多类出租车驾驶员。在第一个案例研究中，该模型通过BO的最佳控制与分析解决方案的最佳控制之间的一致性进行了验证。通过简单的分段线性服务费，电子出租平台的目标可以增加8.4％。在第二个案例研究中，使用BO解决了5.1美元的最佳通行费，这将城市规划者的目标提高了7.9％，而没有任何收费费。在这一最佳通行费用下，纽约市中央商务区的出租车数量减少，表明交通状况更好，而没有实质上增加地铁系统的拥挤。

A large portion of passenger requests is reportedly unserviced, partially due to vacant for-hire drivers' cruising behavior during the passenger seeking process. This paper aims to model the multi-driver repositioning task through a mean field multi-agent reinforcement learning (MARL) approach that captures competition among multiple agents. Because the direct application of MARL to the multi-driver system under a given reward mechanism will likely yield a suboptimal equilibrium due to the selfishness of drivers, this study proposes a reward design scheme with which a more desired equilibrium can be reached. To effectively solve the bilevel optimization problem with upper level as the reward design and the lower level as a multi-agent system, a Bayesian optimization (BO) algorithm is adopted to speed up the learning process. We then apply the bilevel optimization model to two case studies, namely, e-hailing driver repositioning under service charge and multiclass taxi driver repositioning under NYC congestion pricing. In the first case study, the model is validated by the agreement between the derived optimal control from BO and that from an analytical solution. With a simple piecewise linear service charge, the objective of the e-hailing platform can be increased by 8.4%. In the second case study, an optimal toll charge of $5.1 is solved using BO, which improves the objective of city planners by 7.9%, compared to that without any toll charge. Under this optimal toll charge, the number of taxis in the NYC central business district is decreased, indicating a better traffic condition, without substantially increasing the crowdedness of the subway system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题