计算有效的强化学习：有针对性的探索利用简单的规则

论文标题

计算有效的强化学习：有针对性的探索利用简单的规则

Computationally Efficient Reinforcement Learning: Targeted Exploration leveraging Simple Rules

论文作者

Di Natale, Loris, Svetozarevic, Bratislav, Heer, Philipp, Jones, Colin N.

论文摘要

无模型的加固学习（RL）通常遭受样本复杂性差，这主要是由于需要详尽地探索州行动空间以找到良好的策略。另一方面，我们假设系统的专家知识通常使我们能够设计简单的规则，我们希望良好的政策始终遵循。在这项工作中，我们提出了对连续参与者 - 批判框架的简单而有效的修改，以纳入此类规则并避免已知次优的国家行动空间区域，从而显着加速RL药物的收敛性。具体而言，如果代理商不符合我们的直觉，并且批判性地修改了策略的梯度更新步骤，以确保学习过程不受饱和步骤的影响，我们将饱和。在室温控制案例研究中，它允许代理收敛到比没有计算开销的经典代理快6-7倍的良好策略，同时保持良好的最终性能。

Model-free Reinforcement Learning (RL) generally suffers from poor sample complexity, mostly due to the need to exhaustively explore the state-action space to find well-performing policies. On the other hand, we postulate that expert knowledge of the system often allows us to design simple rules we expect good policies to follow at all times. In this work, we hence propose a simple yet effective modification of continuous actor-critic frameworks to incorporate such rules and avoid regions of the state-action space that are known to be suboptimal, thereby significantly accelerating the convergence of RL agents. Concretely, we saturate the actions chosen by the agent if they do not comply with our intuition and, critically, modify the gradient update step of the policy to ensure the learning process is not affected by the saturation step. On a room temperature control case study, it allows agents to converge to well-performing policies up to 6-7x faster than classical agents without computational overhead and while retaining good final performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题