优化用于增强学习的单个刚体模型的两足动物

论文标题

优化用于增强学习的单个刚体模型的两足动物

Optimizing Bipedal Maneuvers of Single Rigid-Body Models for Reinforcement Learning

论文作者

Batke, Ryan, Yu, Fangzhou, Dao, Jeremy, Hurst, Jonathan, Hatton, Ross L., Fern, Alan, Green, Kevin

论文摘要

在这项工作中，我们提出了一种方法，用于生成缩小的模型参考轨迹，用于用于双皮亚机器人的高度动态操作的一般类别，用于SIM到现实的增强学习。我们的方法是利用单个刚体模型（SRBM）来优化轨迹的库库，以用作学习政策的奖励功能的专家参考。此方法将模型的动态旋转和翻译行为转化为全阶机器人模型，并成功地将其传输到真实硬件。 SRBM的简单性允许快速迭代和行为改进，而基于学习的控制器的鲁棒性则可以将高度动态的动作传输到硬件。％在这项工作中，我们介绍了一组可转移性约束，将SRBM动力学修改为实际的两足机器人硬件，这是我们为动态步进，转动操纵和跳跃创建最佳轨迹的框架，以及将参考轨迹集成到强化学习政策的方法。在这项工作中，我们介绍了一组可转移性约束，将SRBM动力学修改为实际的两足机器人硬件，这是我们为各种高度动态动作创建最佳轨迹的框架，以及我们整合参考轨迹的方法，以进行高速增强强化跑步学习政策。我们验证了双皮亚机器人Cassie的方法，我们成功地证明了高达3.0 m/s的高度动态接地步态。

In this work, we propose a method to generate reduced-order model reference trajectories for general classes of highly dynamic maneuvers for bipedal robots for use in sim-to-real reinforcement learning. Our approach is to utilize a single rigid-body model (SRBM) to optimize libraries of trajectories offline to be used as expert references in the reward function of a learned policy. This method translates the model's dynamically rich rotational and translational behaviour to a full-order robot model and successfully transfers to real hardware. The SRBM's simplicity allows for fast iteration and refinement of behaviors, while the robustness of learning-based controllers allows for highly dynamic motions to be transferred to hardware. % Within this work we introduce a set of transferability constraints that amend the SRBM dynamics to actual bipedal robot hardware, our framework for creating optimal trajectories for dynamic stepping, turning maneuvers and jumps as well as our approach to integrating reference trajectories to a reinforcement learning policy. Within this work we introduce a set of transferability constraints that amend the SRBM dynamics to actual bipedal robot hardware, our framework for creating optimal trajectories for a variety of highly dynamic maneuvers as well as our approach to integrating reference trajectories for a high-speed running reinforcement learning policy. We validate our methods on the bipedal robot Cassie on which we were successfully able to demonstrate highly dynamic grounded running gaits up to 3.0 m/s.

下载PDF全文

下载文献需遵守相关版权规定

论文标题