新颖的产生突触可塑性

论文标题

新颖的产生突触可塑性

Novelty Producing Synaptic Plasticity

论文作者

Yaman, Anil, Iacca, Giovanni, Mocanu, Decebal Constantin, Fletcher, George, Pechenizkiy, Mykola

论文摘要

具有可塑性属性的学习过程通常需要加固信号来指导该过程。但是，在某些任务（例如，迷宫游动）中，由于尚不清楚目标的位置，因此很难（或不可能）测量代理（即健身价值）的性能（即健身价值）。这就需要在大量可能的行为之间找到正确的行为，而无需了解强化信号。在这些情况下，可能需要进行详尽的搜索。但是，这可能是不可行的，尤其是在连续域中优化人工神经网络时。在这项工作中，我们引入了产生突触可塑性（NPSP）的新颖性，在那里我们进化了突触可塑性规则，以产生尽可能多的新型行为，以找到可以解决该问题的行为。我们在需要复杂的动作和取得子目标完成的欺骗性迷宫环境中评估NPSP。我们的结果表明，与所提出的NPSP一起使用的搜索启发式确实能够产生更多新颖的行为，而随机搜索是基线。

A learning process with the plasticity property often requires reinforcement signals to guide the process. However, in some tasks (e.g. maze-navigation), it is very difficult (or impossible) to measure the performance of an agent (i.e. a fitness value) to provide reinforcements since the position of the goal is not known. This requires finding the correct behavior among a vast number of possible behaviors without having the knowledge of the reinforcement signals. In these cases, an exhaustive search may be needed. However, this might not be feasible especially when optimizing artificial neural networks in continuous domains. In this work, we introduce novelty producing synaptic plasticity (NPSP), where we evolve synaptic plasticity rules to produce as many novel behaviors as possible to find the behavior that can solve the problem. We evaluate the NPSP on maze-navigation on deceptive maze environments that require complex actions and the achievement of subgoals to complete. Our results show that the search heuristic used with the proposed NPSP is indeed capable of producing much more novel behaviors in comparison with a random search taken as baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题