论文标题
在$ \ varepsilon $ -best-Asswer识别线性匪徒中选择答案
Choosing Answers in $\varepsilon$-Best-Answer Identification for Linear Bandits
论文作者
论文摘要
在纯探索问题中,依次收集信息来回答关于随机环境的问题。尽管近年来对线性匪徒的最佳武器识别进行了广泛的研究,但很少有作品专门用于识别一只aft $ \ varepsilon $ close的手臂,以达到最好的武器(而不是最好的一只)。在这个有几个正确答案的问题中,识别算法应重点放在这些答案之间的一个候选人上,并确认其正确。我们证明,以最高平均值选择答案不允许算法就预期的样本复杂性达到渐近的最优性。相反,应标识一个\ textIt {最远的答案}。使用该洞察力仔细选择候选人答案,我们开发了一个简单的过程,以适应最佳臂识别算法,以应对托管线性随机匪徒中的$ \ varepsilon $ best-best-andwer识别。最后,我们为此设置提出了一种渐近最佳算法,该算法证明可以针对现有改良的最佳臂识别算法实现竞争性的经验性能。
In pure-exploration problems, information is gathered sequentially to answer a question on the stochastic environment. While best-arm identification for linear bandits has been extensively studied in recent years, few works have been dedicated to identifying one arm that is $\varepsilon$-close to the best one (and not exactly the best one). In this problem with several correct answers, an identification algorithm should focus on one candidate among those answers and verify that it is correct. We demonstrate that picking the answer with highest mean does not allow an algorithm to reach asymptotic optimality in terms of expected sample complexity. Instead, a \textit{furthest answer} should be identified. Using that insight to choose the candidate answer carefully, we develop a simple procedure to adapt best-arm identification algorithms to tackle $\varepsilon$-best-answer identification in transductive linear stochastic bandits. Finally, we propose an asymptotically optimal algorithm for this setting, which is shown to achieve competitive empirical performance against existing modified best-arm identification algorithms.