论文标题
学习最佳天线倾斜控制策略:上下文线性匪徒方法
Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach
论文作者
论文摘要
在细胞网络中控制天线倾斜度必须在网络覆盖范围和容量之间进行有效的权衡。在本文中,我们设计了从现有数据(在所谓的被动学习设置中)或算法(主动学习设置)主动生成的数据中学习最佳倾斜控制策略的算法。我们将此类算法的设计正式化为上下文线性多臂匪徒(CL-MAB)中的最佳策略标识(BPI)问题。手臂代表天线倾斜更新;上下文捕获当前的网络条件;奖励对应于提高性能,混合覆盖范围和能力;目的是识别具有给定的信心,即大致最佳的政策(将上下文映射到具有最大奖励的手臂的功能)。对于主动学习和被动学习设置中的CL-MAB,我们会根据任何算法所需的样本数量得出信息理论的下限,这些算法返回具有给定确定性水平的近似最佳策略,并设计实现这些基本限制的算法。我们将算法应用于蜂窝网络中的远程电气倾斜(RET)优化问题,并表明它们可以使用比NAIVE或现有基于规则的学习算法更少的数据样本来制作最佳的倾斜更新策略。
Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity. In this paper, we devise algorithms learning optimal tilt control policies from existing data (in the so-called passive learning setting) or from data actively generated by the algorithms (the active learning setting). We formalize the design of such algorithms as a Best Policy Identification (BPI) problem in Contextual Linear Multi-Arm Bandits (CL-MAB). An arm represents an antenna tilt update; the context captures current network conditions; the reward corresponds to an improvement of performance, mixing coverage and capacity; and the objective is to identify, with a given level of confidence, an approximately optimal policy (a function mapping the context to an arm with maximal reward). For CL-MAB in both active and passive learning settings, we derive information-theoretical lower bounds on the number of samples required by any algorithm returning an approximately optimal policy with a given level of certainty, and devise algorithms achieving these fundamental limits. We apply our algorithms to the Remote Electrical Tilt (RET) optimization problem in cellular networks, and show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.