论文标题

带有背包的上下文土匪用于转换模型

Contextual Bandits with Knapsacks for a Conversion Model

论文作者

Li, Zhen, Stoltz, Gilles

论文摘要

我们考虑具有背包的上下文土匪,在产生的奖励和成本向量之间具有基本结构。我们这样做是出于商业折扣的销售动机。在每一轮比赛中,给定随机的I.I.D. \上下文$ \ Mathbf {x} _t $,而手臂挑选的$ a_t $(例如,相应到折扣级别),可以获得客户的转换,在这种情况下,奖励$ r(a,\ sh.thbf {x} _t} _t} _t and and vector成本是奖励的成本$ c(a_t,\ mathbf {x} _t)$(例如,与收入损失相对应)。否则,在没有转换的情况下,奖励和成本为无效。因此,获得的奖励和成本是通过测量转换或缺失的二进制变量来耦合的。奖励和成本之间的这种潜在结构与Agrawal和Devanur [2016]所考虑的线性结构不同(但我们表明本文中引入的技术也可以应用于这些线性结构的情况下)。根据$ a $ a $ a $ and Mathbf {x} $,基于转换概率的概率估算,在每个回合的线性程序上都求解了自适应策略。这种政策是最自然的,并且在典型订单(OPT/$ b $)$ \ sqrt {t} $中获得了遗憾,其中$ b $是允许的总预算,OPT是静态政策可实现的最佳预期奖励,而$ t $是回合的数量。

We consider contextual bandits with knapsacks, with an underlying structure between rewards generated and cost vectors suffered. We do so motivated by sales with commercial discounts. At each round, given the stochastic i.i.d.\ context $\mathbf{x}_t$ and the arm picked $a_t$ (corresponding, e.g., to a discount level), a customer conversion may be obtained, in which case a reward $r(a,\mathbf{x}_t)$ is gained and vector costs $c(a_t,\mathbf{x}_t)$ are suffered (corresponding, e.g., to losses of earnings). Otherwise, in the absence of a conversion, the reward and costs are null. The reward and costs achieved are thus coupled through the binary variable measuring conversion or the absence thereof. This underlying structure between rewards and costs is different from the linear structures considered by Agrawal and Devanur [2016] (but we show that the techniques introduced in the present article may also be applied to the case of these linear structures). The adaptive policies exhibited solve at each round a linear program based on upper-confidence estimates of the probabilities of conversion given $a$ and $\mathbf{x}$. This kind of policy is most natural and achieves a regret bound of the typical order (OPT/$B$) $\sqrt{T}$, where $B$ is the total budget allowed, OPT is the optimal expected reward achievable by a static policy, and $T$ is the number of rounds.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源