线性匪徒的纯探索游戏化

论文标题

线性匪徒的纯探索游戏化

Gamification of Pure Exploration for Linear Bandits

论文作者

Degenne, Rémy, Ménard, Pierre, Shang, Xuedong, Valko, Michal

论文摘要

我们研究了一个有源的纯探索设置，其中包括最佳臂识别，在线性随机匪徒的背景下。尽管对于标准的多臂匪徒而存在渐近最佳的算法，但尽管有几次尝试解决它，但在线性匪徒中的最佳武器识别算法的存在仍然难以捉摸。首先，在线性情况下，我们提供了对不同最佳概念的彻底比较和新见解，包括G-急诊率，最佳实验设计中的托管最佳性和渐近型最优性。其次，我们设计了第一个渐近最佳算法，用于线性斑块中固定信心纯探索。因此，我们的算法自然绕过了一个简单但困难的例子造成的陷阱，即大多数先前的算法都必须设计以明确处理。最后，我们避免需要通过提供有效实施的方法来充分解决最佳设计问题。

We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new insight over different notions of optimality in the linear case, including G-optimality, transductive optimality from optimal experimental design and asymptotic optimality. Second, we design the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits. As a consequence, our algorithm naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly. Finally, we avoid the need to fully solve an optimal design problem by providing an approach that entails an efficient implementation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题