模块化使加强学习者具有竞争稳态驱动器

论文标题

模块化使加强学习者具有竞争稳态驱动器

Modularity benefits reinforcement learning agents with competing homeostatic drives

论文作者

Dulberg, Zack, Dubey, Rachit, Berwian, Isabel M., Cohen, Jonathan D.

论文摘要

平衡冲突需求的问题是情报的基础。标准增强学习算法最大化标量奖励，这需要将不同的特定于特定于客观的奖励组合为单个数字。另外，也可以在行动值级别组合不同的目标，从而使负责不同目标的专家模块将不同的行动建议提交了决策过程，这是每个基于彼此独立的奖励。在这项工作中，我们探讨了这种替代策略的潜在好处。我们研究了一个与生物学相关的多目标问题，这是一组变量的连续稳态，并将单片深q网络与模块化网络进行比较，并为每个变量提供专用Q-Learner。我们发现模块化药物：a）需要最少的外源确定的探索； b）提高了样品效率； c）对域外扰动更为强大。

The problem of balancing conflicting needs is fundamental to intelligence. Standard reinforcement learning algorithms maximize a scalar reward, which requires combining different objective-specific rewards into a single number. Alternatively, different objectives could also be combined at the level of action value, such that specialist modules responsible for different objectives submit different action suggestions to a decision process, each based on rewards that are independent of one another. In this work, we explore the potential benefits of this alternative strategy. We investigate a biologically relevant multi-objective problem, the continual homeostasis of a set of variables, and compare a monolithic deep Q-network to a modular network with a dedicated Q-learner for each variable. We find that the modular agent: a) requires minimal exogenously determined exploration; b) has improved sample efficiency; and c) is more robust to out-of-domain perturbation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题