论文标题

模块化使加强学习者具有竞争稳态驱动器

Modularity benefits reinforcement learning agents with competing homeostatic drives

论文作者

Dulberg, Zack, Dubey, Rachit, Berwian, Isabel M., Cohen, Jonathan D.

论文摘要

平衡冲突需求的问题是情报的基础。标准增强学习算法最大化标量奖励,这需要将不同的特定于特定于客观的奖励组合为单个数字。另外,也可以在行动值级别组合不同的目标,从而使负责不同目标的专家模块将不同的行动建议提交了决策过程,这是每个基于彼此独立的奖励。在这项工作中,我们探讨了这种替代策略的潜在好处。我们研究了一个与生物学相关的多目标问题,这是一组变量的连续稳态,并将单片深q网络与模块化网络进行比较,并为每个变量提供专用Q-Learner。我们发现模块化药物:a)需要最少的外源确定的探索; b)提高了样品效率; c)对域外扰动更为强大。

The problem of balancing conflicting needs is fundamental to intelligence. Standard reinforcement learning algorithms maximize a scalar reward, which requires combining different objective-specific rewards into a single number. Alternatively, different objectives could also be combined at the level of action value, such that specialist modules responsible for different objectives submit different action suggestions to a decision process, each based on rewards that are independent of one another. In this work, we explore the potential benefits of this alternative strategy. We investigate a biologically relevant multi-objective problem, the continual homeostasis of a set of variables, and compare a monolithic deep Q-network to a modular network with a dedicated Q-learner for each variable. We find that the modular agent: a) requires minimal exogenously determined exploration; b) has improved sample efficiency; and c) is more robust to out-of-domain perturbation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源