订单问题：为人类机器人团队计划计划任务的渐进解释

论文标题

订单问题：为人类机器人团队计划计划任务的渐进解释

Order Matters: Generating Progressive Explanations for Planning Tasks in Human-Robot Teaming

论文作者

Zakershahrak, Mehrdad, Marpally, Shashank Rao, Sharma, Akshay, Gong, Ze, Zhang, Yu

论文摘要

在计划和决策环境中生成解释的先前工作重点是提供AI代理决策背后的理由。尽管这些方法从解释者的角度提供了正确的解释，但它们未能听取从解释者（人类）观点理解解释的认知要求。在这项工作中，我们着手通过首先考虑说明中信息顺序的影响或解释的渐进性来解决这个问题。从直觉上讲，进步将在以前的概念上建立后来的概念，并众所周知有助于更好的学习。在这项工作中，我们旨在研究解释生成期间的类似效果，当解释分为依次传达的多个部分时。这里的挑战在于对人类对信息顺序的偏好进行建模，以接收这种解释以帮助理解。鉴于此顺序过程，提出了基于基于目标的MDP来生成渐进解释的公式。该MDP的奖励函数是通过基于人类学科研究检索的解释来学习的。我们首先评估了我们对寻宝领域的方法，以证明其在捕捉人类偏好方面有效。在分析结果后，它揭示了一些更基本的东西：偏好源于领域的依赖性和独立性。与域独立特征的相关性促使我们在逃生室域中进一步验证了这一结果。结果证实了我们的假设，即理解解释是一个动态过程。反映这方面的人类偏好与我们认知过程中更深入的知识同化的进步完全相对应。

Prior work on generating explanations in a planning and decision-making context has focused on providing the rationale behind an AI agent's decision making. While these methods provide the right explanations from the explainer's perspective, they fail to heed the cognitive requirement of understanding an explanation from the explainee's (the human's) perspective. In this work, we set out to address this issue by first considering the influence of information order in an explanation, or the progressiveness of explanations. Intuitively, progression builds later concepts on previous ones and is known to contribute to better learning. In this work, we aim to investigate similar effects during explanation generation when an explanation is broken into multiple parts that are communicated sequentially. The challenge here lies in modeling the humans' preferences for information order in receiving such explanations to assist understanding. Given this sequential process, a formulation based on goal-based MDP for generating progressive explanations is presented. The reward function of this MDP is learned via inverse reinforcement learning based on explanations that are retrieved via human subject studies. We first evaluated our approach on a scavenger-hunt domain to demonstrate its effectively in capturing the humans' preferences. Upon analyzing the results, it revealed something more fundamental: the preferences arise strongly from both domain dependent and independence features. The correlation with domain independent features pushed us to verify this result further in an escape room domain. Results confirmed our hypothesis that the process of understanding an explanation was a dynamic process. The human preference that reflected this aspect corresponded exactly to the progression for knowledge assimilation hidden deeper in our cognitive process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题