企业WLAN中的分散AP选择的多臂匪徒

论文标题

企业WLAN中的分散AP选择的多臂匪徒

Multi-Armed Bandits for Decentralized AP selection in Enterprise WLANs

论文作者

Carrascosa, Marc, Bellalta, Boris

论文摘要

WiFi致密化导致存在多个重叠覆盖范围区域，这使用户站（Stas）可以在不同的访问点（AP）之间进行选择。标准WiFi关联方法使STA选择具有最强信号的AP，在许多情况下，在许多情况下，在人满为患的同时，某些AP的未定率不足。为了减轻这种情况，可以使用诸如多武器匪徒之类的增强学习技术来动态学习AP和Stas之间的最佳映射，因此可以相应地将Stas重新分布。这是一个特别具有挑战性的问题，因为给定的STA观察到的网络响应取决于其他人的行为，因此，如果没有网络的全局视图，很难预测。在本文中，我们专注于以分散的方式解决这个问题，在该方式中，Stas独立探索了其覆盖范围内的不同AP，并选择一个更好地满足其需求的AP。为此，我们提出了一种新颖的方法，称为机会主义的Epsilon-Greedy，并具有粘性，在发现合适的AP时停止了探索，只有在几个不满意的关联回合之后才能恢复探索。通过这种方法，我们会大大减少网络响应动态，从而提高Stas更快地找到解决方案的能力，并更有效地利用网络资源。我们研究场景的特征（AP和Stas的位置，交通负载和渠道分配策略）如何影响学习过程和可实现的绩效。我们还表明，并非所有的Stas都必须实施提出的解决方案以提高其性能。最后，我们研究了电台逐渐到达系统的情况，表明所考虑的方法也适合在这种非平稳设置中。

WiFi densification leads to the existence of multiple overlapping coverage areas, which allows user stations (STAs) to choose between different Access Points (APs). The standard WiFi association method makes the STAs select the AP with the strongest signal, which in many cases leads to underutilization of some APs while overcrowding others. To mitigate this situation, Reinforcement Learning techniques such as Multi-Armed Bandits can be used to dynamically learn the optimal mapping between APs and STAs, and so redistribute the STAs among the available APs accordingly. This is an especially challenging problem since the network response observed by a given STA depends on the behavior of the others, and so it is very difficult to predict without a global view of the network. In this paper, we focus on solving this problem in a decentralized way, where STAs independently explore the different APs inside their coverage range, and select the one that better satisfy their needs. To do it, we propose a novel approach called Opportunistic epsilon-greedy with Stickiness that halts the exploration when a suitable AP is found, only resuming the exploration after several unsatisfactory association rounds. With this approach, we reduce significantly the network response dynamics, improving the ability of the STAs to find a solution faster, as well as achieving a more efficient use of the network resources. We investigate how the characteristics of the scenario (position of the APs and STAs, traffic loads, and channel allocation strategies) impact the learning process and the achievable performance. We also show that not all the STAs have to implement the proposed solution to improve their performance. Finally, we study the case where stations arrive progressively to the system, showing that the considered approach is also suitable in such a non-stationary set-up.

下载PDF全文

下载文献需遵守相关版权规定

论文标题