论文标题
近似Martingale流程,以减少较大的状态空间的深度强化学习的差异
Approximating Martingale Process for Variance Reduction in Deep Reinforcement Learning with Large State Space
论文作者
论文摘要
事实证明,在多类排队网络等特定情况下,近似Martingale过程(AMP)可有效减少增强学习(RL)。但是,在已经久经考验的情况下,状态空间相对较小,并且可以迭代所有可能的状态过渡。在本文中,我们考虑了在考虑状态过渡时国家空间很大且具有不确定性的系统,从而使AMP成为RL中的广义差异方法。具体来说,我们将调查AMP在Uber等乘车系统中的应用,在Uber之类的乘车系统中,近端策略优化(PPO)被合并以优化匹配驱动程序和客户的策略。
Approximating Martingale Process (AMP) is proven to be effective for variance reduction in reinforcement learning (RL) in specific cases such as Multiclass Queueing Networks. However, in the already proven cases, the state space is relatively small and all possible state transitions can be iterated through. In this paper, we consider systems in which state space is large and have uncertainties when considering state transitions, thus making AMP a generalized variance-reduction method in RL. Specifically, we will investigate the application of AMP in ride-hailing systems like Uber, where Proximal Policy Optimization (PPO) is incorporated to optimize the policy of matching drivers and customers.