论文标题
联合结合的经验过程方法:组合和线性匪徒的实用算法
An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits
论文作者
论文摘要
本文提出了在固定置信度和固定预算设置中纯探索线性匪徒问题的近乎最佳算法。利用经验过程的尖顶理论的思想,我们提供了一种算法,其样品复杂性以实例的几何形状扩展,并避免了对武器数量的明确结合。与以前的方法不同,这些方法是基于最小化最差的案例方差(例如G-最佳设计)的样品,我们定义了一个基于基础臂集的高斯宽度的实验设计目标。我们在这个目标方面提供了一种新颖的下限,该目标突出了其在样本复杂性中的基本作用。我们固定置信算法的样本复杂性与该下限匹配,此外,对于组合类别,例如最短的路径,匹配和矩形,其中手臂套件可以在尺寸中成倍大。最后,我们提出了在固定预算设置中的第一个线性匪徒的算法。它的保证与我们的下限与对数因素相匹配。
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings. Leveraging ideas from the theory of suprema of empirical processes, we provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms. Unlike previous approaches which sample based on minimizing a worst-case variance (e.g. G-optimal design), we define an experimental design objective based on the Gaussian-width of the underlying arm set. We provide a novel lower bound in terms of this objective that highlights its fundamental role in the sample complexity. The sample complexity of our fixed confidence algorithm matches this lower bound, and in addition is computationally efficient for combinatorial classes, e.g. shortest-path, matchings and matroids, where the arm sets can be exponentially large in the dimension. Finally, we propose the first algorithm for linear bandits in the the fixed budget setting. Its guarantee matches our lower bound up to logarithmic factors.