论文标题
自适应学习系统的深入强化学习
Deep Reinforcement Learning for Adaptive Learning Systems
论文作者
论文摘要
在本文中,我们制定了自适应学习问题---如何找到基于学习者的潜在特征的最合适的学习材料的个性化学习计划(称为策略)的问题 - 在自适应学习系统中作为马尔可夫决策过程(MDP)面临。我们假设潜在性状与未知的过渡模型是连续的。我们应用了一种无模型的深钢筋学习算法---深度Q学习算法---可以有效地从学习者学习过程的数据中找到最佳的学习政策,而无需了解学习者的实际过渡模型的实际过渡模型。为了有效利用可用数据,我们还开发了一个过渡模型估计器,该估计器使用神经网络模拟学习者的学习过程。过渡模型估计器可以在深Q学习算法中使用,因此它可以更有效地发现学习者的最佳学习策略。数值模拟研究证明,提出的算法在寻找良好的学习政策方面非常有效,尤其是在过渡模型估计器的帮助下,它可以在使用少量学习者培训后找到最佳的学习政策。
In this paper, we formulate the adaptive learning problem---the problem of how to find an individualized learning plan (called policy) that chooses the most appropriate learning materials based on learner's latent traits---faced in adaptive learning systems as a Markov decision process (MDP). We assume latent traits to be continuous with an unknown transition model. We apply a model-free deep reinforcement learning algorithm---the deep Q-learning algorithm---that can effectively find the optimal learning policy from data on learners' learning process without knowing the actual transition model of the learners' continuous latent traits. To efficiently utilize available data, we also develop a transition model estimator that emulates the learner's learning process using neural networks. The transition model estimator can be used in the deep Q-learning algorithm so that it can more efficiently discover the optimal learning policy for a learner. Numerical simulation studies verify that the proposed algorithm is very efficient in finding a good learning policy, especially with the aid of a transition model estimator, it can find the optimal learning policy after training using a small number of learners.