使用进化非线性决策树进行离散行动系统的易于解释的ai政策

论文标题

使用进化非线性决策树进行离散行动系统的易于解释的ai政策

Towards Interpretable-AI Policies Induction using Evolutionary Nonlinear Decision Trees for Discrete Action Systems

论文作者

Dhebar, Yashesh, Deb, Kalyanmoy, Nageshrao, Subramanya, Zhu, Ling, Filev, Dimitar

论文摘要

黑盒AI感应方法（例如深钢筋学习（DRL））越来越多地用于寻找给定控制任务的最佳策略。尽管使用Black-Box AI代表的策略能够有效执行基础控制任务并实现最佳的闭环性能，但开发的控制规则通常很复杂，既不可解释又可以解释。在本文中，我们使用最近提出的非线性决策树（NLDT）方法来查找一组层次控制规则集，以最大程度地利用标签的状态action数据集来近似和解释预训练和解释预先训练的黑盒DRL（Oracle）代理。使用进化计算的非线性优化方法的最新进展有助于找到一组非线性控制规则的层次结构，以在拟议的NLDT节点的每个节点上使用计算快速的双层优化过程的状态变量的函数。此外，我们提出了一个重视程序，以增强已经衍生的NLDT的闭环性能。我们在具有多个离散操作的不同控制问题上评估了我们提出的方法（开放和闭环NLDT）。在所有这些问题中，我们提出的方法能够找到相对简单且可解释的规则，涉及每个规则一到四个非线性术语，同时与受过训练的Black-Box DRL代理相比，同时实现了PAR闭环性能。还建议了一种简化NLDT的后处理方法。获得的结果令人鼓舞，因为它们建议用相对简单的可解释策略替换复杂的黑盒DRL政策，涉及成千上万的参数（使其无解释）。结果令人鼓舞和激励，以进一步应用拟议方法来解决更复杂的控制任务。

Black-box AI induction methods such as deep reinforcement learning (DRL) are increasingly being used to find optimal policies for a given control task. Although policies represented using a black-box AI are capable of efficiently executing the underlying control task and achieving optimal closed-loop performance, the developed control rules are often complex and neither interpretable nor explainable. In this paper, we use a recently proposed nonlinear decision-tree (NLDT) approach to find a hierarchical set of control rules in an attempt to maximize the open-loop performance for approximating and explaining the pre-trained black-box DRL (oracle) agent using the labelled state-action dataset. Recent advances in nonlinear optimization approaches using evolutionary computation facilitates finding a hierarchical set of nonlinear control rules as a function of state variables using a computationally fast bilevel optimization procedure at each node of the proposed NLDT. Additionally, we propose a re-optimization procedure for enhancing closed-loop performance of an already derived NLDT. We evaluate our proposed methodologies (open and closed-loop NLDTs) on different control problems having multiple discrete actions. In all these problems our proposed approach is able to find relatively simple and interpretable rules involving one to four non-linear terms per rule, while simultaneously achieving on par closed-loop performance when compared to a trained black-box DRL agent. A post-processing approach for simplifying the NLDT is also suggested. The obtained results are inspiring as they suggest the replacement of complicated black-box DRL policies involving thousands of parameters (making them non-interpretable) with relatively simple interpretable policies. Results are encouraging and motivating to pursue further applications of proposed approach in solving more complex control tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题