使用模型和政策的熵正规化基于模型的模仿学习

论文标题

使用模型和政策的熵正规化基于模型的模仿学习

Model-Based Imitation Learning Using Entropy Regularization of Model and Policy

论文作者

Uchibe, Eiji

论文摘要

基于生成的对抗网络进行模仿学习的方法是有希望的，因为它们在专家演示方面是有效的样本。但是，培训生成器需要与实际环境进行许多互动，因为采用了无模型的增强学习来更新策略。为了通过基于模型的增强学习提高样品效率，我们建议在熵登记的马尔可夫决策过程中基于模型的熵登记的模仿学习（MB-eril），以减少与实际环境的相互作用数量。 MB-eril使用两个歧视因子。策略歧视者将机器人与专家的动作区分开来，模型歧视者区分了由模型产生的反事实状态转变与实际措施的转变。我们得出结构化的歧视者，以便学习政策和模型是有效的。计算机模拟和实际机器人实验表明，与基线方法相比，MB-eril实现了竞争性能，并显着提高了样品效率。

Approaches based on generative adversarial networks for imitation learning are promising because they are sample efficient in terms of expert demonstrations. However, training a generator requires many interactions with the actual environment because model-free reinforcement learning is adopted to update a policy. To improve the sample efficiency using model-based reinforcement learning, we propose model-based Entropy-Regularized Imitation Learning (MB-ERIL) under the entropy-regularized Markov decision process to reduce the number of interactions with the actual environment. MB-ERIL uses two discriminators. A policy discriminator distinguishes the actions generated by a robot from expert ones, and a model discriminator distinguishes the counterfactual state transitions generated by the model from the actual ones. We derive structured discriminators so that the learning of the policy and the model is efficient. Computer simulations and real robot experiments show that MB-ERIL achieves a competitive performance and significantly improves the sample efficiency compared to baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题