预算受限的土匪在一般成本和奖励分配上

论文标题

预算受限的土匪在一般成本和奖励分配上

Budget-Constrained Bandits over General Cost and Reward Distributions

论文作者

Cayci, Semih, Eryilmaz, Atilla, Srikant, R.

论文摘要

我们考虑一个预算受限的匪徒问题，其中每只手臂都会产生随机成本，并获得随机的回报。目的是在预算限制下对总成本的预期奖励最大化。该模型是一般的，从某种意义上说，它允许相关且潜在的重尾成本奖励对，可以按照许多应用程序要求承担负值。我们表明，如果所有成本奖励对都存在一些$γ> 0 $的订单$（2+γ）$，则$ o（\ log b）$遗憾是预算$ b> 0 $的。为了达到紧密的遗憾界限，我们提出了通过通过线性最小于点于点估计提取共同信息来利用每个臂成本和奖励之间相关性的算法。我们证明了这个问题的遗憾下限，并表明所提出的算法达到了紧密的问题依赖性的遗憾界限，在共同的高斯成本和奖励对的情况下，这是最佳的，可以达到一个普遍的恒定因素。

We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavy-tailed cost-reward pairs that can take on negative values as required by many applications. We show that if moments of order $(2+γ)$ for some $γ> 0$ exist for all cost-reward pairs, $O(\log B)$ regret is achievable for a budget $B>0$. In order to achieve tight regret bounds, we propose algorithms that exploit the correlation between the cost and reward of each arm by extracting the common information via linear minimum mean-square error estimation. We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题