通过人口编码的尖峰神经网络进行深度加强学习，以进行连续控制

论文标题

通过人口编码的尖峰神经网络进行深度加强学习，以进行连续控制

Deep Reinforcement Learning with Population-Coded Spiking Neural Network for Continuous Control

论文作者

Tang, Guangzhi, Kumar, Neelesh, Yoo, Raymond, Michmizos, Konstantinos P.

论文摘要

节能控制机器人的控制至关重要，因为其实际应用的复杂性越来越多地涉及高维观察和动作空间，这不能被有限的车载资源所抵消。在神经形态处理器上运行尖峰神经网络（SNN）的新兴非冯·诺伊曼智力模型，被认为是用于低尺寸控制任务的先进实时机器人控制器的节能且可靠的替代方案。现在，这种新计算范式的挑战是扩展，以便它可以跟上现实世界的任务。为此，SNN需要克服其培训的固有局限性，即其尖峰神经元代表信息和缺乏有效学习算法的能力有限。在这里，我们提出了一个由人口编码的尖峰演员网络（POPSAN），该网络（Popsan）与深层批评网络使用深度强化学习（DRL）结合培训。人口编码方案大大提高了网络的表示能力，而混合学习将深网的培训优势与尖峰网络的节能推断相结合。为了显示我们的方法的一般适用性，我们将其与各种政策和非政策DRL算法集成在一起。我们在英特尔的Loihi神经形态芯片上部署了训练有素的Popsan，并根据主流DRL算法进行了基准测试，以进行连续控制。为了在所有方法之间进行公平的比较，我们在OpenAI Gym任务中验证了它们。与Jetson TX2上的Deep Actor网络相比，我们的Loihi-Run Popsan每次推断的能量减少了140倍，并且具有相同的性能水平。我们的结果支持神经形态控制器的效率，并建议我们的混合RL作为深度学习的替代方案，当时能量效率和鲁棒性都很重要。

The energy-efficient control of mobile robots is crucial as the complexity of their real-world applications increasingly involves high-dimensional observation and action spaces, which cannot be offset by limited on-board resources. An emerging non-Von Neumann model of intelligence, where spiking neural networks (SNNs) are run on neuromorphic processors, is regarded as an energy-efficient and robust alternative to the state-of-the-art real-time robotic controllers for low dimensional control tasks. The challenge now for this new computing paradigm is to scale so that it can keep up with real-world tasks. To do so, SNNs need to overcome the inherent limitations of their training, namely the limited ability of their spiking neurons to represent information and the lack of effective learning algorithms. Here, we propose a population-coded spiking actor network (PopSAN) trained in conjunction with a deep critic network using deep reinforcement learning (DRL). The population coding scheme dramatically increased the representation capacity of the network and the hybrid learning combined the training advantages of deep networks with the energy-efficient inference of spiking networks. To show the general applicability of our approach, we integrated it with a spectrum of both on-policy and off-policy DRL algorithms. We deployed the trained PopSAN on Intel's Loihi neuromorphic chip and benchmarked our method against the mainstream DRL algorithms for continuous control. To allow for a fair comparison among all methods, we validated them on OpenAI gym tasks. Our Loihi-run PopSAN consumed 140 times less energy per inference when compared against the deep actor network on Jetson TX2, and had the same level of performance. Our results support the efficiency of neuromorphic controllers and suggest our hybrid RL as an alternative to deep learning, when both energy-efficiency and robustness are important.

下载PDF全文

下载文献需遵守相关版权规定

论文标题