论文标题
多游戏土匪中的快速变化识别及其在无线网络中的应用
Fast Change Identification in Multi-Play Bandits and its Applications in Wireless Networks
论文作者
论文摘要
下一代无线服务的特征是一套多样的要求,以维持,无线访问点需要定期探究网络中的用户。在这方面,我们研究了一种新型的多臂匪徒(MAB)设置,该设置要求定期定期探测所有手臂,同时跟踪非平稳环境中最佳当前臂。特别是,我们开发了\ texttt {ts-ge},以平衡经典汤普森采样(TS)的遗憾与所有武器的广播探测(BP),以便积极地检测奖励分布的变化。该算法中的主要创新是通过可选的子例程(GE)识别更改的臂,该子例程(GE)将$ k-$武装的强盗设置的缩放为$ \ log_2(k)$。我们表征了错过检测的概率和在环境参数方面的假警报的概率。我们强调了\ texttt {ts-ge}的遗憾保证优于最先进算法的条件,特别是\ texttt {adswitch}和\ texttt {m-ucb}。我们通过在两个无线系统应用中使用\ texttt {ts-ge}的功效来证明\ texttt {ts-ge}的功效 - 在移动边缘计算中(MEC)中的任务卸载和一个工业互联网(IIOT)网络(用于同时无线信息和电力传输(SWIPT))。
Next-generation wireless services are characterized by a diverse set of requirements, to sustain which, the wireless access points need to probe the users in the network periodically. In this regard, we study a novel multi-armed bandit (MAB) setting that mandates probing all the arms periodically while keeping track of the best current arm in a non-stationary environment. In particular, we develop \texttt{TS-GE} that balances the regret guarantees of classical Thompson sampling (TS) with the broadcast probing (BP) of all the arms simultaneously in order to actively detect a change in the reward distributions. The main innovation in the algorithm is in identifying the changed arm by an optional subroutine called group exploration (GE) that scales as $\log_2(K)$ for a $K-$armed bandit setting. We characterize the probability of missed detection and the probability of false-alarm in terms of the environment parameters. We highlight the conditions in which the regret guarantee of \texttt{TS-GE} outperforms that of the state-of-the-art algorithms, in particular, \texttt{ADSWITCH} and \texttt{M-UCB}. We demonstrate the efficacy of \texttt{TS-GE} by employing it in two wireless system application - task offloading in mobile-edge computing (MEC) and an industrial internet-of-things (IIoT) network designed for simultaneous wireless information and power transfer (SWIPT).