通过差分动态编程可微分的最佳控制

论文标题

通过差分动态编程可微分的最佳控制

Differentiable Optimal Control via Differential Dynamic Programming

论文作者

Dinev, Traiko, Mastalli, Carlos, Ivan, Vladimir, Tonneau, Steve, Vijayakumar, Sethu

论文摘要

机器人设计优化，模仿学习和系统标识共享一个常见的问题，该问题需要对机器人或任务参数进行优化，同时优化机器人运动。为了解决这些问题，我们可以使用可区分的最佳控制，以便为其相对于参数的运动的梯度。我们提出了一种通过使用灵敏度分析（SA）的差分动态编程（DDP）算法来有效地分析这些梯度的方法。我们表明，计算梯度时必须包括二阶动力学项。但是，计算运动时不需要包括它们。我们验证我们在摆和双摆系统上的方法。此外，我们比较使用迭代线性二次调节器（ILQR）的衍生物，该迭代线性二次调节器（ILQR）在Kinova ARM的共同设计任务上忽略了这些二阶术语，在这里我们优化了目标到达目标的机器人的链路长度。我们表明，使用ILQR梯度忽略二阶动力学的优化会影响衍生物的计算。取而代之的是，使用DDP梯度进行优化，对于一系列初始设计，使我们的公式扩展到复杂的系统。

Robot design optimization, imitation learning and system identification share a common problem which requires optimization over robot or task parameters at the same time as optimizing the robot motion. To solve these problems, we can use differentiable optimal control for which the gradients of the robot's motion with respect to the parameters are required. We propose a method to efficiently compute these gradients analytically via the differential dynamic programming (DDP) algorithm using sensitivity analysis (SA). We show that we must include second-order dynamics terms when computing the gradients. However, we do not need to include them when computing the motion. We validate our approach on the pendulum and double pendulum systems. Furthermore, we compare against using the derivatives of the iterative linear quadratic regulator (iLQR), which ignores these second-order terms everywhere, on a co-design task for the Kinova arm, where we optimize the link lengths of the robot for a target reaching task. We show that optimizing using iLQR gradients diverges as ignoring the second-order dynamics affects the computation of the derivatives. Instead, optimizing using DDP gradients converges to the same optimum for a range of initial designs allowing our formulation to scale to complex systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题