论文标题

最佳手臂识别以及小差距下的上下文信息

Best Arm Identification with Contextual Information under a Small Gap

论文作者

Kato, Masahiro, Imaizumi, Masaaki, Ishihara, Takuya, Kitagawa, Toru

论文摘要

我们使用固定的预算和上下文(协变)信息研究最佳武器识别(BAI)问题。在自适应实验的每一轮中,在观察上下文信息之后,我们使用过去的观察和当前上下文选择一个治疗臂。我们的目标是确定最好的治疗臂,这是一个治疗臂,其在上下文分布中被边缘化的最大预期奖励,错误识别的可能性最小。在这项研究中,我们考虑了一类非参数匪徒模型,这些模型在差距为零时会收敛到位置转移模型。首先,我们得出了在小差距制度下某种类别的策略和匪徒模型(潜在结果的概率模型)的错误识别概率的下限。一个小差距制度是一种情况,即最佳和最佳治疗臂之间预期奖励的差距为零,这对应于确定最佳治疗臂的最坏情况之一。然后,我们开发````随机抽样(RS)增强反相反的概率加权(AIPW)策略',在策略下误识别的概率与小于小间隙制度的无限性时,在策略下误识别的概率与下限相匹配。 RS-AIPW策略由RS规则组成,该规则跟踪目标样本分配比和使用AIPW估计器的建议规则。

We study the best-arm identification (BAI) problem with a fixed budget and contextual (covariate) information. In each round of an adaptive experiment, after observing contextual information, we choose a treatment arm using past observations and current context. Our goal is to identify the best treatment arm, which is a treatment arm with the maximal expected reward marginalized over the contextual distribution, with a minimal probability of misidentification. In this study, we consider a class of nonparametric bandit models that converge to location-shift models when the gaps go to zero. First, we derive lower bounds of the misidentification probability for a certain class of strategies and bandit models (probabilistic models of potential outcomes) under a small-gap regime. A small-gap regime is a situation where gaps of the expected rewards between the best and suboptimal treatment arms go to zero, which corresponds to one of the worst cases in identifying the best treatment arm. We then develop the ``Random Sampling (RS)-Augmented Inverse Probability weighting (AIPW) strategy,'' which is asymptotically optimal in the sense that the probability of misidentification under the strategy matches the lower bound when the budget goes to infinity in the small-gap regime. The RS-AIPW strategy consists of the RS rule tracking a target sample allocation ratio and the recommendation rule using the AIPW estimator.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源