论文标题
通过增强的定时控制的神经网络模型
A neural network model for timing control with reinforcement
论文作者
论文摘要
当可能性的空间是无限的时,人类和动物如何进行试验学习?在先前的研究中,我们使用了间隔定时生产任务,并发现了一种更新策略,在该策略中,代理商调整了行为和神经元噪声以进行探索。在实验中,人类受试者主动产生了一系列定时电动机输出。我们发现,顺序电机定时在两个时间尺度上有所不同:由于记忆漂移而导致目标间隔周围的长期相关性,并根据反馈对时间变化的短期调整。我们先前曾通过增强的高斯过程(称为奖励敏感的高斯过程(RSGP))描述了定时变异性的这些特征。在这里,我们提供了一个机械模型,并通过借用复发性神经网络的体系结构来模拟过程。尽管经常性连接提供了电动机时机的长期串行相关性,以促进奖励驱动的短期变化,但我们引入了网络连接性奖励依赖性变异性,灵感来自大脑突触传播的随机性质。我们的模型能够递归生成一个输出序列,该输出序列将内部变异性和外部增强型纳入贝叶斯框架中。我们表明该模型可以学习人类行为的关键特征。与其他搜索唯一网络连接的神经网络模型不同,该模型预测和观察之间的最佳匹配度不同,该模型可以估计与每个结果相关的不确定性,因此在嘲笑无法解释的可调节性变异性方面做得更好。所提出的人工神经网络模型与神经系统中信息处理的机制相似,并可以扩展持续状态控制中脑启发的增强学习的框架。
How do humans and animals perform trial-and-error learning when the space of possibilities is infinite? In a previous study, we used an interval timing production task and discovered an updating strategy in which the agent adjusted the behavioral and neuronal noise for exploration. In the experiment, human subjects proactively generated a series of timed motor outputs. We found that the sequential motor timing varied at two temporal scales: long-term correlation around the target interval due to memory drifts and short-term adjustments of timing variability according to feedback. We have previously described these features of timing variability with an augmented Gaussian process, termed reward sensitive Gaussian process (RSGP). Here we provide a mechanistic model and simulate the process by borrowing the architecture of recurrent neural networks. While recurrent connection provided the long-term serial correlation in motor timing, to facilitate reward-driven short-term variations, we introduced reward-dependent variability in the network connectivity, inspired by the stochastic nature of synaptic transmission in the brain. Our model was able to recursively generate an output sequence incorporating the internal variability and external reinforcement in a Bayesian framework. We show that the model can learn the key features of human behavior. Unlike other neural network models that search for unique network connectivity for the best match between the model prediction and observation, this model can estimate the uncertainty associated with each outcome and thus did a better job in teasing apart adjustable task-relevant variability from unexplained variability. The proposed artificial neural network model parallels the mechanisms of information processing in neural systems and can extend the framework of brain-inspired reinforcement learning in continuous state control.