在汤普森（Thompson）采样中

论文标题

在汤普森（Thompson）采样中

On Thompson Sampling with Langevin Algorithms

论文作者

Mazumdar, Eric, Pacchiano, Aldo, Ma, Yi-an, Bartlett, Peter L., Jordan, Michael I.

论文摘要

众所周知，汤普森（Thompson）对多军匪徒问题进行抽样在理论和实践中都具有良好的表现。但是，它在计算上具有重要的局限性，这是由于每次迭代中后验分布的样本的需求引起的。我们提出了针对汤普森采样的两种马尔可夫链蒙特卡洛（MCMC）方法，以解决此问题。我们迅速构建了Langevin算法以生成具有准确性保证的近似样品，并且我们利用新型的后置浓度率来分析产生的近似汤普森采样算法的遗憾。此外，我们为MCMC程序指定了必要的超参数，以确保最佳的实例频繁遗憾，同时具有较低的计算复杂性。特别是，我们的算法同时利用后浓度和样品再利用机制，以确保在每个回合中只需要恒定数量的迭代和恒定的数据。由此产生的近似汤普森采样算法具有对数遗憾，其计算复杂性不会随算法的时间范围而扩展。

Thompson sampling for multi-armed bandit problems is known to enjoy favorable performance in both theory and practice. However, it suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue. We construct quickly converging Langevin algorithms to generate approximate samples that have accuracy guarantees, and we leverage novel posterior concentration rates to analyze the regret of the resulting approximate Thompson sampling algorithm. Further, we specify the necessary hyperparameters for the MCMC procedure to guarantee optimal instance-dependent frequentist regret while having low computational complexity. In particular, our algorithms take advantage of both posterior concentration and a sample reuse mechanism to ensure that only a constant number of iterations and a constant amount of data is needed in each round. The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题