学习机车：了解环境设计对深度强化学习的重要性

论文标题

学习机车：了解环境设计对深度强化学习的重要性

Learning to Locomote: Understanding How Environment Design Matters for Deep Reinforcement Learning

论文作者

Reda, Daniele, Tao, Tianxin, van de Panne, Michiel

论文摘要

学习通景是基于物理学的动画和深度强化学习（RL）中最常见的任务之一。一项学识渊博的政策是RL环境和RL算法所体现的要解决的问题的产物。虽然非常关注RL算法，但对设计选择对RL环境的影响知之甚少。在本文中，我们表明环境设计很重要，并记录了它如何有助于许多RL结果的脆弱性。具体来说，我们检查了与状态表示，初始状态分布，奖励结构，控制频率，发作终止程序，课程使用，动作空间和扭矩限制有关的选择。我们旨在激发围绕此类选择的讨论，在实践中，当应用于动画感兴趣的连续行动控制问题（例如学习机动）时，它会严重影响RL的成功。

Learning to locomote is one of the most common tasks in physics-based animation and deep reinforcement learning (RL). A learned policy is the product of the problem to be solved, as embodied by the RL environment, and the RL algorithm. While enormous attention has been devoted to RL algorithms, much less is known about the impact of design choices for the RL environment. In this paper, we show that environment design matters in significant ways and document how it can contribute to the brittle nature of many RL results. Specifically, we examine choices related to state representations, initial state distributions, reward structure, control frequency, episode termination procedures, curriculum usage, the action space, and the torque limits. We aim to stimulate discussion around such choices, which in practice strongly impact the success of RL when applied to continuous-action control problems of interest to animation, such as learning to locomote.

下载PDF全文

下载文献需遵守相关版权规定

论文标题