概率计划，并部分订购了时间目标的偏好

论文标题

概率计划，并部分订购了时间目标的偏好

Probabilistic Planning with Partially Ordered Preferences over Temporal Goals

论文作者

Rahmani, Hazhar, Kulkarni, Abhishek N., Fu, Jie

论文摘要

在本文中，我们研究了以马尔可夫决策过程（MDP）为模型的随机系统中的计划，其偏好比时间扩展的目标。偏好的时间计划上的先前工作假定用户偏好构成总顺序，这意味着每对结果彼此相当。在这项工作中，我们考虑了对可能结果的偏好是部分顺序而不是总订单的情况。我们首先引入了确定性有限自动机的变体，称为偏好DFA，用于指定用户对时间扩展目标的偏好。基于顺序理论，我们将偏好DFA转化为与标记为MDP中概率计划的策略相比的偏好关系。在这种处理中，最优选的策略会在MDP中的有限路径上引起弱化的非主导概率分布。拟议的计划算法取决于建造多目标MDP。我们证明，鉴于偏好规范在构建的多目标MDP中，弱化的非主导政策是帕累托最佳的，反之亦然。在整篇文章中，我们采用一个运行的示例来演示提出的偏好规范和解决方案方法。我们使用详细分析的示例来显示算法的功效，然后讨论可能的未来方向。

In this paper, we study planning in stochastic systems, modeled as Markov decision processes (MDPs), with preferences over temporally extended goals. Prior work on temporal planning with preferences assumes that the user preferences form a total order, meaning that every pair of outcomes are comparable with each other. In this work, we consider the case where the preferences over possible outcomes are a partial order rather than a total order. We first introduce a variant of deterministic finite automaton, referred to as a preference DFA, for specifying the user's preferences over temporally extended goals. Based on the order theory, we translate the preference DFA to a preference relation over policies for probabilistic planning in a labeled MDP. In this treatment, a most preferred policy induces a weak-stochastic nondominated probability distribution over the finite paths in the MDP. The proposed planning algorithm hinges on the construction of a multi-objective MDP. We prove that a weak-stochastic nondominated policy given the preference specification is Pareto-optimal in the constructed multi-objective MDP, and vice versa. Throughout the paper, we employ a running example to demonstrate the proposed preference specification and solution approaches. We show the efficacy of our algorithm using the example with detailed analysis, and then discuss possible future directions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题