将线性控制策略拟合到具有卡尔曼约束的演示

论文标题

将线性控制策略拟合到具有卡尔曼约束的演示

Fitting a Linear Control Policy to Demonstrations with a Kalman Constraint

论文作者

Palan, Malayandi, Barratt, Shane, McCauley, Alex, Sadigh, Dorsa, Sindhwani, Vikas, Boyd, Stephen

论文摘要

我们考虑了从调节系统的专家演示来学习线性动力系统的线性控制策略的问题。解决此问题的标准方法是策略拟合，它通过最大程度地减少示范和策略输出之间的损失功能以及编码先验知识的正则化功能，从而符合线性策略。尽管它很简单，但在示威游行很少时，这种方法仍无法以低甚至有限的成本学习政策。我们建议在策略拟合中增加一个额外的约束，即该策略是解决某些LQR问题的解决方案，即，在随机控制意义上以某种选择的二次成本选择。我们将此约束称为卡尔曼约束。使用卡尔曼约束的政策拟合需要解决凸成本和双线性约束的优化问题。我们根据乘数的交替方向方法（ADMM）提出了一种启发式方法，以大致解决此问题。数值实验表明，添加Kalman约束可以使我们能够学习良好，即即使很少有数据可用，也可以学习低成本，政策。

We consider the problem of learning a linear control policy for a linear dynamical system, from demonstrations of an expert regulating the system. The standard approach to this problem is policy fitting, which fits a linear policy by minimizing a loss function between the demonstrations and the policy's outputs plus a regularization function that encodes prior knowledge. Despite its simplicity, this method fails to learn policies with low or even finite cost when there are few demonstrations. We propose to add an additional constraint to policy fitting, that the policy is the solution to some LQR problem, i.e., optimal in the stochastic control sense for some choice of quadratic cost. We refer to this constraint as a Kalman constraint. Policy fitting with a Kalman constraint requires solving an optimization problem with convex cost and bilinear constraints. We propose a heuristic method, based on the alternating direction method of multipliers (ADMM), to approximately solve this problem. Numerical experiments demonstrate that adding the Kalman constraint allows us to learn good, i.e., low cost, policies even when very few data are available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题