论文标题
在线控制简介
Introduction to Online Control
论文作者
论文摘要
本文介绍了新兴范式控制动态系统和可区分的强化学习,称为在线非策略控制。新方法采用在线凸优化和凸放松的技术,以获得具有最佳和稳健控制的经典设置可证明保证的新方法。 在线非策略控制与其他框架之间的主要区别是目标。在最佳控制,可靠的控制方法和其他假设随机噪声的方法中,目标是与离线最佳策略相当地执行。在在线非障碍控制中,成本函数以及假定动力学模型的扰动均由对手选择。因此,最佳策略未定义先验。相反,目标是对基准政策的最佳政策对最佳政策感到遗憾。 该目标表明将在线凸优化的决策框架用作算法方法。最终的方法基于迭代数学优化算法,并伴随有限的遗憾和计算复杂性保证。
This text presents an introduction to an emerging paradigm in control of dynamical systems and differentiable reinforcement learning called online nonstochastic control. The new approach applies techniques from online convex optimization and convex relaxations to obtain new methods with provable guarantees for classical settings in optimal and robust control. The primary distinction between online nonstochastic control and other frameworks is the objective. In optimal control, robust control, and other control methodologies that assume stochastic noise, the goal is to perform comparably to an offline optimal strategy. In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary. Thus the optimal policy is not defined a priori. Rather, the target is to attain low regret against the best policy in hindsight from a benchmark class of policies. This objective suggests the use of the decision making framework of online convex optimization as an algorithmic methodology. The resulting methods are based on iterative mathematical optimization algorithms, and are accompanied by finite-time regret and computational complexity guarantees.