论文标题
压缩模仿学习
Compressed imitation learning
论文作者
论文摘要
与压缩感应相比,在频域中的稀疏性中,我们允许样品有效的信号重建,我们建议在启用样品效率模仿学习之前利用策略简单性(OCCAM的剃须刀)作为一种。我们首先证明了该方案在可以直接采样状态值函数的线性情况下的可行性。我们还将计划扩展到仅可见操作的方案以及从非线性网络获得策略的方案。该方法是针对行为克隆的基准测试的,并在专家示范有限的情况下导致得分明显更高。
In analogy to compressed sensing, which allows sample-efficient signal reconstruction given prior knowledge of its sparsity in frequency domain, we propose to utilize policy simplicity (Occam's Razor) as a prior to enable sample-efficient imitation learning. We first demonstrated the feasibility of this scheme on linear case where state-value function can be sampled directly. We also extended the scheme to scenarios where only actions are visible and scenarios where the policy is obtained from nonlinear network. The method is benchmarked against behavior cloning and results in significantly higher scores with limited expert demonstrations.