论文标题
有效运动学习的最终效应勘探驱动器
End-Effect Exploration Drive for Effective Motor Learning
论文作者
论文摘要
在没有明确的远期模型的情况下,提出了末端效应驱动器,即在强化学习的关键目标是颠倒目标分布的想法是将目标驱动器倒置为实施目标导向运动学习的有效方法。最终效应模型依赖于当前策略效果的简单统计记录,此处用作替代资源要求的远期模型的替代品。当与奖励结构结合使用时,它构成了轻质变分的自由能最小化设置的核心。主要困难在于维护此简化效果模型以及策略的在线更新。当先前的目标分布统一时,它提供了一种学习有效的勘探政策的方法,并与内在的好奇心原理保持一致。当与外部奖励结合使用时,我们的方法最终被证明可以提供比传统的非政策外技术更快的培训。
Stemming on the idea that a key objective in reinforcement learning is to invert a target distribution of effects, end-effect drives are proposed as an effective way to implement goal-directed motor learning, in the absence of an explicit forward model. An end-effect model relies on a simple statistical recording of the effect of the current policy, here used as a substitute for the more resource-demanding forward models. When combined with a reward structure, it forms the core of a lightweight variational free energy minimization setup. The main difficulty lies in the maintenance of this simplified effect model together with the online update of the policy. When the prior target distribution is uniform, it provides a ways to learn an efficient exploration policy, consistently with the intrinsic curiosity principles. When combined with an extrinsic reward, our approach is finally shown to provide a faster training than traditional off-policy techniques.