重建行动以解释深度强化学习

论文标题

重建行动以解释深度强化学习

Reconstructing Actions To Explain Deep Reinforcement Learning

论文作者

Chen, Xuan, Wang, Zifan, Fan, Yucai, Jin, Bonan, Mardziel, Piotr, Joe-Wong, Carlee, Datta, Anupam

论文摘要

特征归因一直是用深神经网络（DNNS）解释投入特征的基础构建基础，但是当应用于深度强化学习（RL）时，面临着新的挑战。我们提出了一种新的方法来解释深层RL动作，通过定义一类\ emph {Action Rectruction}的效果，这些方法可以模仿一个网络，该网络是一个深层的网络。这种方法使我们能够回答比直接应用DNN归因方法更复杂的解释性问题，在构建我们的动作重建时，我们适应了\ emph {行为级属性}。它还允许我们定义\ emph {nenseal}，这是一种定量评估我们方法的解释性的度量。我们对各种ATARI游戏的实验表明，基于扰动的归因方法比替代归因方法更适合重建动作来解释深度RL药物，并且与使用注意力相比，表现出更大的\ emph {nocalsibal}。我们进一步表明，行动重建使我们能够演示深层代理商如何学习Pac-Man游戏。

Feature attribution has been a foundational building block for explaining the input feature importance in supervised learning with Deep Neural Network (DNNs), but face new challenges when applied to deep Reinforcement Learning (RL).We propose a new approach to explaining deep RL actions by defining a class of \emph{action reconstruction} functions that mimic the behavior of a network in deep RL. This approach allows us to answer more complex explainability questions than direct application of DNN attribution methods, which we adapt to \emph{behavior-level attributions} in building our action reconstructions. It also allows us to define \emph{agreement}, a metric for quantitatively evaluating the explainability of our methods. Our experiments on a variety of Atari games suggest that perturbation-based attribution methods are significantly more suitable in reconstructing actions to explain the deep RL agent than alternative attribution methods, and show greater \emph{agreement} than existing explainability work utilizing attention. We further show that action reconstruction allows us to demonstrate how a deep agent learns to play Pac-Man game.

下载PDF全文

下载文献需遵守相关版权规定

论文标题