论文标题
通过模型转换解释的强化学习
Explainable Reinforcement Learning via Model Transforms
论文作者
论文摘要
理解加固学习(RL)代理的新兴行为可能很困难,因为这种代理通常使用高度复杂的决策程序在复杂的环境中进行训练。这引起了RL中解释性的多种方法,旨在调和可能在主体行为与观察者预期的行为之间产生的差异。最近的方法取决于域知识,可能并非总是可用的,分析代理商的策略,或者是对基础环境的特定元素的分析,通常被建模为马尔可夫决策过程(MDP)。我们的主要主张是,即使尚不完全了解基础模型(例如,尚未准确地了解过渡概率),也无法由代理(即使用无模型方法)维护,但仍可以利用该模型自动生成解释。为此,我们建议使用以前在文献中使用的正式MDP抽象和转换来加快寻找最佳策略的搜索,以自动产生解释。由于这种转换通常基于环境的符号表示,因此它们可以为预期的和实际的代理行为之间的差距提供有意义的解释。我们正式定义了解释性问题,提出了一类可用于解释新兴行为的转换,并提出了能够有效搜索解释的方法。我们演示了一组标准基准测试的方法。
Understanding emerging behaviors of reinforcement learning (RL) agents may be difficult since such agents are often trained in complex environments using highly complex decision making procedures. This has given rise to a variety of approaches to explainability in RL that aim to reconcile discrepancies that may arise between the behavior of an agent and the behavior that is anticipated by an observer. Most recent approaches have relied either on domain knowledge that may not always be available, on an analysis of the agent's policy, or on an analysis of specific elements of the underlying environment, typically modeled as a Markov Decision Process (MDP). Our key claim is that even if the underlying model is not fully known (e.g., the transition probabilities have not been accurately learned) or is not maintained by the agent (i.e., when using model-free methods), the model can nevertheless be exploited to automatically generate explanations. For this purpose, we suggest using formal MDP abstractions and transforms, previously used in the literature for expediting the search for optimal policies, to automatically produce explanations. Since such transforms are typically based on a symbolic representation of the environment, they can provide meaningful explanations for gaps between the anticipated and actual agent behavior. We formally define the explainability problem, suggest a class of transforms that can be used for explaining emergent behaviors, and suggest methods that enable efficient search for an explanation. We demonstrate the approach on a set of standard benchmarks.