使用线性过渡模型的元学习MDP

论文标题

使用线性过渡模型的元学习MDP

Meta Learning MDPs with Linear Transition Models

论文作者

Müller, Robert, Pacchiano, Aldo

论文摘要

我们研究了马尔可夫决策过程（MDP）中的元学习，并在未验证的情节环境中使用线性过渡模型。在基于模型接近度的任务共享度度量下，我们研究任务系列，其特征在于偏置项和方差组件指定的模型的分布。然后，我们提出Buc-MatrixRl，这是UC-Matrix RL算法的版本，并证明它可以有意义地利用一组采样的培训任务来快速求解从相同任务分布中采样的测试任务，通过学习任务分布偏置参数的估计值。该分析利用并扩展了将学习线性回归和线性匪徒设置的学习结果，并使用线性过渡模型的MDP的更一般情况。我们证明，与孤立地学习任务相比，Buc-Matrix RL为高偏差低差异任务分布的转移后悔提供了重大改善。

We study meta-learning in Markov Decision Processes (MDP) with linear transition models in the undiscounted episodic setting. Under a task sharedness metric based on model proximity we study task families characterized by a distribution over models specified by a bias term and a variance component. We then propose BUC-MatrixRL, a version of the UC-Matrix RL algorithm, and show it can meaningfully leverage a set of sampled training tasks to quickly solve a test task sampled from the same task distribution by learning an estimator of the bias parameter of the task distribution. The analysis leverages and extends results in the learning to learn linear regression and linear bandit setting to the more general case of MDP's with linear transition models. We prove that compared to learning the tasks in isolation, BUC-Matrix RL provides significant improvements in the transfer regret for high bias low variance task distributions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题