论文标题
通过横向和基于梯度的优化进行模型预测性控制
Model-Predictive Control via Cross-Entropy and Gradient-Based Optimization
论文作者
论文摘要
高维模型预测性控制和基于模型的增强型学习的最新作品已通过学习的动力学和奖励模型求助于基于人群的优化方法,例如跨渗透方法(CEM),用于计划一系列动作。为了决定采取行动,CEM根据动态模型和奖励进行了最高回报的动作顺序。通常从无条件的高斯分布中随机采样动作序列,并在环境中进行评估。此分布已迭代地更新为具有较高回报的动作序列。但是,这种计划方法效率可能非常低,尤其是对于高维操作空间。替代方法的方法直接通过梯度下降来优化动作序列,但容易出现局部Optima。我们提出了一种方法,通过交流CEM和梯度下降步骤来优化动作序列来解决此计划问题。我们的实验表明,即使是高维作用空间,避免局部最小值以及对CEM的更好或相等的性能,也表明了提出的混合方法的收敛速度更快。本文随附的代码可在此处提供https://github.com/homangab/gradcem。
Recent works in high-dimensional model-predictive control and model-based reinforcement learning with learned dynamics and reward models have resorted to population-based optimization methods, such as the Cross-Entropy Method (CEM), for planning a sequence of actions. To decide on an action to take, CEM conducts a search for the action sequence with the highest return according to the dynamics model and reward. Action sequences are typically randomly sampled from an unconditional Gaussian distribution and evaluated on the environment. This distribution is iteratively updated towards action sequences with higher returns. However, this planning method can be very inefficient, especially for high-dimensional action spaces. An alternative line of approaches optimize action sequences directly via gradient descent, but are prone to local optima. We propose a method to solve this planning problem by interleaving CEM and gradient descent steps in optimizing the action sequence. Our experiments show faster convergence of the proposed hybrid approach, even for high-dimensional action spaces, avoidance of local minima, and better or equal performance to CEM. Code accompanying the paper is available here https://github.com/homangab/gradcem.