论文标题

关于脑启发的增强学习算法的可靠性和概括性

On the Reliability and Generalizability of Brain-inspired Reinforcement Learning Algorithms

论文作者

Kim, Dongjae, Lee, Jee Hang, Shin, Jae Hoon, Yang, Minsu Abel, Lee, Sang Wan

论文摘要

尽管Deep RL模型在最少的监督下显示了解决各种任务的巨大潜力,但是从有限的经验,适应环境变化以及从单个任务中概括学习的方面,仍然存在一些关键挑战。决策神经科学方面的最新证据表明,人脑具有解决这些问题的天生能力,从而对神经科学启发的解决方案的发展进行了乐观,朝向样品有效和可推广的RL算法。我们表明,将基于模型和无模型控制的计算模型(我们称为前额叶RL)可靠地编码了人类学到的高级政策的信息,并且该模型可以将学习的策略推广到广泛的任务。首先,我们培训了前额叶RL,并在82个受试者的数据上进行了深入的RL算法,在人类参与者执行两阶段的马尔可夫决策任务时,我们操纵了目标,国家跨国不确定性和州空间的复杂性。在包括潜在行为概况和参数可恢复性测试的可靠性测试中,我们表明前额叶RL可靠地学习了人类的潜在政策,而所有其他模型都失败了。其次,为了测试这些模型从原始任务中学到的知识的能力,我们将它们位于环境波动的背景下。具体来说,我们使用10个马尔可夫决策任务进行了大规模模拟,其中潜在上下文变量会随着时间而变化。我们的信息理论分析表明,前额叶RL表现出最高水平的适应性和情节编码功效。这是正式测试模仿大脑解决一般问题方式的计算模型可能导致机器学习中关键挑战的实用解决方案的第一次尝试。

Although deep RL models have shown a great potential for solving various types of tasks with minimal supervision, several key challenges remain in terms of learning from limited experience, adapting to environmental changes, and generalizing learning from a single task. Recent evidence in decision neuroscience has shown that the human brain has an innate capacity to resolve these issues, leading to optimism regarding the development of neuroscience-inspired solutions toward sample-efficient, and generalizable RL algorithms. We show that the computational model combining model-based and model-free control, which we term the prefrontal RL, reliably encodes the information of high-level policy that humans learned, and this model can generalize the learned policy to a wide range of tasks. First, we trained the prefrontal RL, and deep RL algorithms on 82 subjects' data, collected while human participants were performing two-stage Markov decision tasks, in which we manipulated the goal, state-transition uncertainty and state-space complexity. In the reliability test, which includes the latent behavior profile and the parameter recoverability test, we showed that the prefrontal RL reliably learned the latent policies of the humans, while all the other models failed. Second, to test the ability to generalize what these models learned from the original task, we situated them in the context of environmental volatility. Specifically, we ran large-scale simulations with 10 Markov decision tasks, in which latent context variables change over time. Our information-theoretic analysis showed that the prefrontal RL showed the highest level of adaptability and episodic encoding efficacy. This is the first attempt to formally test the possibility that computational models mimicking the way the brain solves general problems can lead to practical solutions to key challenges in machine learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源