线性时间逻辑任务下马尔可夫决策过程的熵率最大化

论文标题

线性时间逻辑任务下马尔可夫决策过程的熵率最大化

Entropy Rate Maximization of Markov Decision Processes under Linear Temporal Logic Tasks

论文作者

Chen, Yu, Li, Shaoyuan, Yin, Xiang

论文摘要

我们研究了Markov决策过程（MDP）综合具有定性和定量目标的最佳控制策略的问题。具体而言，我们的目标是实现给定的线性时间逻辑（LTL）任务，同时最大化系统的\ emph {熵率}。熵率的概念表征了随机过程的长期平均值（UN）可预测性。从安全的角度来看，这种最佳政策尤其是我们感兴趣的，因为它不仅可以确保完成任务的完成，而且可以最大程度地提高系统的不可预测性。但是，现有的作品仅着重于最大化总熵，这可能会在无限的地平线上脱离无穷大。在本文中，我们为LTL约束下的熵率最大化问题提供了完整的解决方案。具体而言，我们首先提出了一种用于合成熵率的算法，以最大化通信MDP的策略。然后，基于一种新的状态分类方法，我们显示LTL任务下的熵率最大化问题可以在多项式时间中有效解决。我们根据机器人任务计划方案的两个案例研究来说明拟议的算法。

We investigate the problem of synthesizing optimal control policies for Markov decision processes (MDPs) with both qualitative and quantitative objectives. Specifically, our goal is to achieve a given linear temporal logic (LTL) task with probability one, while maximizing the \emph{entropy rate} of the system. The notion of entropy rate characterizes the long-run average (un)predictability of a stochastic process. Such an optimal policy is of our interest, in particular, from the security point of view, as it not only ensures the completion of tasks, but also maximizes the unpredictability of the system. However, existing works only focus on maximizing the total entropy which may diverge to infinity for infinite horizon. In this paper, we provide a complete solution to the entropy rate maximization problem under LTL constraints. Specifically, we first present an algorithm for synthesizing entropy rate maximizing policies for communicating MDPs. Then based on a new state classification method, we show the entropy rate maximization problem under LTL task can be effectively solved in polynomial-time. We illustrate the proposed algorithm based on two case studies of robot task planning scenario.

下载PDF全文

下载文献需遵守相关版权规定

论文标题