论文标题

概率顺序收缩:与腐败的随机土匪的最佳手臂识别算法

Probabilistic Sequential Shrinking: A Best Arm Identification Algorithm for Stochastic Bandits with Corruptions

论文作者

Zhong, Zixin, Cheung, Wang Chi, Tan, Vincent Y. F.

论文摘要

我们考虑了在T步骤的固定预算设置中的随机匪徒的最佳手臂识别(BAI)问题。我们设计了一种新颖的随机算法,概率的顺序收缩($ u $)(PSS($ u $)),这对腐败数量不可知。当每个步骤的损坏数量(CPS)低于阈值时,PSS($ U $)将确定最佳的手臂或物品,概率趋于$ 1 $,为$ t \ rightarrow \ rightarrow \ infty $。否则,确定项目的最佳差距会与CP优雅地降低。我们认为这样的分叉是必要的。在PSS($ U $)中,参数$ u $用于在最佳差距和成功概率之间进行平衡。注射随机化被证明对于减轻腐败的影响至关重要。为了证明这一点,我们设计了适用于任何算法的两种攻击策略。我们将其中一个应用于Karnin等人的pss($ u $)的确定性类似物($ u $)。 (2013)。攻击策略会导致SH的较高故障概率,但是PSS($ U $)仍然强大。在没有腐败的情况下,PSS($ 2 $)的绩效保证与SH相匹配。我们表明,当CPS足够大时,没有算法可以达到$ 1 $的BAI概率为$ t \ rightarrow \ infty $。数值实验证实了我们的理论发现。

We consider a best arm identification (BAI) problem for stochastic bandits with adversarial corruptions in the fixed-budget setting of T steps. We design a novel randomized algorithm, Probabilistic Sequential Shrinking($u$) (PSS($u$)), which is agnostic to the amount of corruptions. When the amount of corruptions per step (CPS) is below a threshold, PSS($u$) identifies the best arm or item with probability tending to $1$ as $T\rightarrow \infty$. Otherwise, the optimality gap of the identified item degrades gracefully with the CPS.We argue that such a bifurcation is necessary. In PSS($u$), the parameter $u$ serves to balance between the optimality gap and success probability. The injection of randomization is shown to be essential to mitigate the impact of corruptions. To demonstrate this, we design two attack strategies that are applicable to any algorithm. We apply one of them to a deterministic analogue of PSS($u$) known as Successive Halving (SH) by Karnin et al. (2013). The attack strategy results in a high failure probability for SH, but PSS($u$) remains robust. In the absence of corruptions, PSS($2$)'s performance guarantee matches SH's. We show that when the CPS is sufficiently large, no algorithm can achieve a BAI probability tending to $1$ as $T\rightarrow \infty$. Numerical experiments corroborate our theoretical findings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源