通过累积过度采样的在线学习：应用预算影响最大化

论文标题

通过累积过度采样的在线学习：应用预算影响最大化

Online Learning with Cumulative Oversampling: Application to Budgeted Influence Maximization

论文作者

Wang, Shatian, Yang, Shuoguang, Xu, Zhen, Truong, Van-Anh

论文摘要

我们提出了一种用于在线学习的累积过采样（CO）方法。我们的关键想法是在每个回合中一次从更新的信念空间进行采样参数估计（类似于汤普森采样），并利用累积样本到当前一轮的累积样本来构建乐观的参数估计，与使用标准UCB方法构建的置信度相比，渐近参数围绕真实参数作为更严格的上限置信度。我们将CO应用于影响最大化的新型预算变体（IM）半带有线性概括的边缘权重，其离线问题是NP-HARD。将CO与我们为离线问题设计的Oracle组合，我们的在线学习算法同时解决了预算分配，参数学习和奖励最大化。我们表明，对于IM半伴侣，我们的基于CO的算法实现了与基于UCB的算法相当的刻度遗憾，并在数值实验中与Thompson采样相当。

We propose a cumulative oversampling (CO) method for online learning. Our key idea is to sample parameter estimations from the updated belief space once in each round (similar to Thompson Sampling), and utilize the cumulative samples up to the current round to construct optimistic parameter estimations that asymptotically concentrate around the true parameters as tighter upper confidence bounds compared to the ones constructed with standard UCB methods. We apply CO to a novel budgeted variant of the Influence Maximization (IM) semi-bandits with linear generalization of edge weights, whose offline problem is NP-hard. Combining CO with the oracle we design for the offline problem, our online learning algorithm simultaneously tackles budget allocation, parameter learning, and reward maximization. We show that for IM semi-bandits, our CO-based algorithm achieves a scaled regret comparable to that of the UCB-based algorithms in theory, and performs on par with Thompson Sampling in numerical experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题