在线非线性控制的信息理论遗憾范围

论文标题

在线非线性控制的信息理论遗憾范围

Information Theoretic Regret Bounds for Online Nonlinear Control

论文作者

Kakade, Sham, Krishnamurthy, Akshay, Lowrey, Kendall, Ohnishi, Motoya, Sun, Wen

论文摘要

这项工作研究了未知的非线性动力学系统中连续控制的问题，在该系统中，我们将基础系统动力学建模为已知重现核Hilbert空间中未知功能。该框架产生了一个通用设置，允许离散和连续的控制输入以及非平滑，非差异性动态。我们的主要结果是较低的基于置信度的连续控制（$ LC^3 $）算法，享受了一个近乎最佳的$ O（\ sqrt {t}）$遗憾，遗憾的是在情节设置中对最佳控制器的束缚，其中$ t $是情节的数量。界限对系统动力学的维度没有明确的依赖性，这可能是无限的，而仅取决于信息理论量。我们从经验上展示了其在许多非线性控制任务上的应用，并证明了学习模型动态的探索的好处。

This work studies the problem of sequential control in an unknown, nonlinear dynamical system, where we model the underlying system dynamics as an unknown function in a known Reproducing Kernel Hilbert Space. This framework yields a general setting that permits discrete and continuous control inputs as well as non-smooth, non-differentiable dynamics. Our main result, the Lower Confidence-based Continuous Control ($LC^3$) algorithm, enjoys a near-optimal $O(\sqrt{T})$ regret bound against the optimal controller in episodic settings, where $T$ is the number of episodes. The bound has no explicit dependence on dimension of the system dynamics, which could be infinite, but instead only depends on information theoretic quantities. We empirically show its application to a number of nonlinear control tasks and demonstrate the benefit of exploration for learning model dynamics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题