第一次接触：通过共同信息最大化无监督的人机共同适应

论文标题

第一次接触：通过共同信息最大化无监督的人机共同适应

First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization

论文作者

Reddy, Siddharth, Levine, Sergey, Dragan, Anca D.

论文摘要

我们如何才能训练辅助人机接口（例如，基于肌电图的肢体假体），将用户的原始命令信号转换为机器人或计算机的动作，当没有事先映射，我们不能以动作标签或奖励反馈的形式向用户提供监督，并且我们没有对用户的先验知识来实现完成的任务？本文中的关键思想是，无论任务如何，当接口更直观时，用户的命令就会不那么嘈杂。我们将这个想法形式化为一个完全无监督的目标，以优化接口：用户命令信号与环境中诱导状态过渡之间的相互信息。为了评估此相互信息得分是否可以区分有效的界面和无效界面，我们对540K的示例进行了观察性研究，该示例的用户操作各种键盘和眼睛凝视接口，用于打字，控制模拟机器人和玩视频游戏。结果表明，我们的共同信息得分可预测各个域中的基础任务完成指标，而Spearman的平均等级相关为0.43。除了对现有接口的离线评估外，我们还使用无监督的目标从头开始学习一个接口：我们将接口随机初始化，用户尝试使用接口尝试执行其所需的任务，测量相互信息得分，并通过强化学习更新接口以最大程度地提高界面以最大程度地提高互助信息。我们通过用户研究与12名参与者进行用户研究评估我们的方法，他们使用扰动的鼠标执行2D光标控制任务，并与一个用户使用手势玩《月球兰德勒》游戏的用户进行实验。结果表明，我们可以在30分钟内从头开始学习一个接头，无需任何用户监督或任务的事务知识。

How can we train an assistive human-machine interface (e.g., an electromyography-based limb prosthesis) to translate a user's raw command signals into the actions of a robot or computer when there is no prior mapping, we cannot ask the user for supervision in the form of action labels or reward feedback, and we do not have prior knowledge of the tasks the user is trying to accomplish? The key idea in this paper is that, regardless of the task, when an interface is more intuitive, the user's commands are less noisy. We formalize this idea as a completely unsupervised objective for optimizing interfaces: the mutual information between the user's command signals and the induced state transitions in the environment. To evaluate whether this mutual information score can distinguish between effective and ineffective interfaces, we conduct an observational study on 540K examples of users operating various keyboard and eye gaze interfaces for typing, controlling simulated robots, and playing video games. The results show that our mutual information scores are predictive of the ground-truth task completion metrics in a variety of domains, with an average Spearman's rank correlation of 0.43. In addition to offline evaluation of existing interfaces, we use our unsupervised objective to learn an interface from scratch: we randomly initialize the interface, have the user attempt to perform their desired tasks using the interface, measure the mutual information score, and update the interface to maximize mutual information through reinforcement learning. We evaluate our method through a user study with 12 participants who perform a 2D cursor control task using a perturbed mouse, and an experiment with one user playing the Lunar Lander game using hand gestures. The results show that we can learn an interface from scratch, without any user supervision or prior knowledge of tasks, in under 30 minutes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题