论文标题
PAC-Bayes的匪徒问题:调查和实验比较
PAC-Bayes Bounds for Bandit Problems: A Survey and Experimental Comparison
论文作者
论文摘要
Pac-Bayes最近重新出现了一种有效的理论,可以通过这种理论得出有原则的学习算法,并具有严格的性能保证。但是,Pac-bayes在强盗问题上的应用相对罕见,这是一个很大的不幸。医疗保健,金融和自然科学方面的许多决策问题都可以建模为匪徒问题。在许多这些应用中,具有强大绩效保证的原则性算法将不胜感激。这项调查提供了针对匪徒问题和这些边界的实验比较的pac-bayes边界的概述。一方面,我们发现Pac-Bayes边界是设计具有性能保证的离线强盗算法的有用工具。在我们的实验中,Pac-Bayesian离线上下文匪徒算法能够学习具有竞争性预期奖励和非易变性能保证的随机神经网络政策。另一方面,我们测试的Pac-Bayesian在线强盗算法的累积后悔范围松散。最后,我们讨论了一些关于Pac-Bayesian Bandit算法的工作的主题。
PAC-Bayes has recently re-emerged as an effective theory with which one can derive principled learning algorithms with tight performance guarantees. However, applications of PAC-Bayes to bandit problems are relatively rare, which is a great misfortune. Many decision-making problems in healthcare, finance and natural sciences can be modelled as bandit problems. In many of these applications, principled algorithms with strong performance guarantees would be very much appreciated. This survey provides an overview of PAC-Bayes bounds for bandit problems and an experimental comparison of these bounds. On the one hand, we found that PAC-Bayes bounds are a useful tool for designing offline bandit algorithms with performance guarantees. In our experiments, a PAC-Bayesian offline contextual bandit algorithm was able to learn randomised neural network polices with competitive expected reward and non-vacuous performance guarantees. On the other hand, the PAC-Bayesian online bandit algorithms that we tested had loose cumulative regret bounds. We conclude by discussing some topics for future work on PAC-Bayesian bandit algorithms.