学习作为强化：将神经科学的原则应用于更普遍的增强剂学习推动者

论文标题

学习作为强化：将神经科学的原则应用于更普遍的增强剂学习推动者

Learning as Reinforcement: Applying Principles of Neuroscience for More General Reinforcement Learning Agents

论文作者

Zelikman, Eric, Yin, William, Wang, Kenneth

论文摘要

在开发可以很好地概括的AI的一个重大挑战中，正在设计一些能够了解自己的世界的代理商，而不会被告知要学习什么，并将这些学习应用于稀疏奖励的挑战。此外，大多数传统的强化学习方法都以与生物学学习相对应的方式明确分开学习和决策。我们通过结合了生物学算法的计算有效抽象来实现建立在实验神经科学原理中的体系结构。我们的方法灵感来自研究依赖性可塑性，短期和长期记忆之间的过渡以及各种神经递质在奖励好奇心中的作用。神经元中的神经元体系结构可以完全可以概括地学习，并展示了一种有效的构建和应用表示形式的方法，而无需明确优化一组标准或操作。我们发现它在许多环境中都表现良好，包括Openai Gym的Mountain Car，除了触摸山上难以触及的旗帜外，它没有任何回报，它在其中学习了简单的策略来改善它的摆锤的时间，视频流，它自发地学习了开放和封闭的手机，以区分开放式和封闭的环境，以及其他诸如Google Chros的环境。

A significant challenge in developing AI that can generalize well is designing agents that learn about their world without being told what to learn, and apply that learning to challenges with sparse rewards. Moreover, most traditional reinforcement learning approaches explicitly separate learning and decision making in a way that does not correspond to biological learning. We implement an architecture founded in principles of experimental neuroscience, by combining computationally efficient abstractions of biological algorithms. Our approach is inspired by research on spike-timing dependent plasticity, the transition between short and long term memory, and the role of various neurotransmitters in rewarding curiosity. The Neurons-in-a-Box architecture can learn in a wholly generalizable manner, and demonstrates an efficient way to build and apply representations without explicitly optimizing over a set of criteria or actions. We find it performs well in many environments including OpenAI Gym's Mountain Car, which has no reward besides touching a hard-to-reach flag on a hill, Inverted Pendulum, where it learns simple strategies to improve the time it holds a pendulum up, a video stream, where it spontaneously learns to distinguish an open and closed hand, as well as other environments like Google Chrome's Dinosaur Game.

下载PDF全文

下载文献需遵守相关版权规定

论文标题