论文标题
强化学习方案的分析,用于空中无线电单元的轨迹优化
Analysis of Reinforcement Learning Schemes for Trajectory Optimization of an Aerial Radio Unit
论文作者
论文摘要
本文介绍了无人驾驶汽车(UAV)的部署,作为轻巧的无线访问点,这些无线访问点在新兴的开放无线接入网络(O-RAN)的背景下利用固定基础架构。更确切地说,我们提出了一个动态服务于服务区域的空中无线电单元,并通过无人机和最接近塔之间的无线领域连接到分布式单元。在本文中,我们在多输入多输出(MIMO)褪色通道中同时提供UES和Central单元(CUS)时,根据人工智能(AI)分析了无人机轨迹。我们首先证明了基于无人机位置最大化整个网络吞吐量的问题的非概念性,然后我们使用两种不同的机器学习方法来解决它。我们首先假设环境是一个网格世界,然后使用离线Q学习和在线SARSA算法和实现的路径损失作为奖励,让无人机通过从A点到B点飞行来探索环境。为了最大程度地提高平均收益,第二种情况下的轨迹被描述为马尔可夫决策过程(MDP)。根据模拟,MDP在较小的环境和更少的时间内产生更好的结果。相比之下,SARSA在较大的环境中的表现更好,但牺牲了较长的飞行持续时间。
This paper introduces the deployment of unmanned aerial vehicles (UAVs) as lightweight wireless access points that leverage the fixed infrastructure in the context of the emerging open radio access network (O-RAN). More precisely, we propose an aerial radio unit that dynamically serves an under served area and connects to the distributed unit via a wireless fronthaul between the UAV and the closest tower. In this paper we analyze the UAV trajectory in terms of artificial intelligence (AI) when it serves both UEs and central units (CUs) at the same time in multi input multi output (MIMO) fading channel. We first demonstrate the nonconvexity of the problem of maximizing the overall network throughput based on UAV location, and then we use two different machine learning approaches to solve it. We first assume that the environment is a gridworld and then let the UAV explore the environment by flying from point A to point B, using both the offline Q-learning and the online SARSA algorithm and the achieved path-loss as the reward. With the intention of maximizing the average payoff, the trajectory in the second scenario is described as a Markov decision process (MDP). According to simulations, MDP produces better results in a smaller setting and in less time. In contrast, SARSA performs better in larger environments at the expense of a longer flight duration.