论文标题
你如何行动?一项经验研究,以了解深入强化学习者的行为
How Do You Act? An Empirical Study to Understand Behavior of Deep Reinforcement Learning Agents
论文作者
论文摘要
深入加强学习者对决策过程的透明度的需求比以往任何时候都更大,因为它们在安全性和道德上具有挑战性领域(例如自主驾驶)中的使用增加。在这项实证研究中,我们解决了这种想法受到神经科学领域的研究的启发之后的缺乏透明度。我们通过其激活空间来表征代理商政策网络的学会表示,并执行部分网络消融,以比较健康和故意损坏的网络的表示。我们表明,健康的代理的行为的特征是网络层的激活与情节过程中执行的动作与网络消融之间的明显相关模式,这会导致这种模式发生强烈的变化,从而导致代理失败了其训练有素的控制任务。此外,健康代理的学说的表示形式的特征是其激活空间中的独特模式反映了情节中其不同的行为阶段,当通过网络消融而扭曲时,导致代理失败了其受过训练的控制任务。最后,我们主张对人工神经网络的新观点作为经验研究的对象,就像神经科学研究中的生物学神经系统一样,为对人工神经网络的透明度和可解释性研究的研究铺平了新的科学可交解性标准。
The demand for more transparency of decision-making processes of deep reinforcement learning agents is greater than ever, due to their increased use in safety critical and ethically challenging domains such as autonomous driving. In this empirical study, we address this lack of transparency following an idea that is inspired by research in the field of neuroscience. We characterize the learned representations of an agent's policy network through its activation space and perform partial network ablations to compare the representations of the healthy and the intentionally damaged networks. We show that the healthy agent's behavior is characterized by a distinct correlation pattern between the network's layer activation and the performed actions during an episode and that network ablations, which cause a strong change of this pattern, lead to the agent failing its trained control task. Furthermore, the learned representation of the healthy agent is characterized by a distinct pattern in its activation space reflecting its different behavioral stages during an episode, which again, when distorted by network ablations, leads to the agent failing its trained control task. Concludingly, we argue in favor of a new perspective on artificial neural networks as objects of empirical investigations, just as biological neural systems in neuroscientific studies, paving the way towards a new standard of scientific falsifiability with respect to research on transparency and interpretability of artificial neural networks.