论文标题

IRS协助Noma辅助移动边缘计算队列稳定性:异质多代理增强学习

IRS Assisted NOMA Aided Mobile Edge Computing with Queue Stability: Heterogeneous Multi-Agent Reinforcement Learning

论文作者

Yu, Jiadong, Li, Yang, Liu, Xiaolan, Sun, Bo, Wu, Yuan, Tsang, Danny H. K.

论文摘要

通过使用强大的边缘服务器进行数据处理,移动边缘计算(MEC)被认为是支持新兴计算密集型应用程序的有前途的技术。此外,非正交的多重访问(NOMA)辅助MEC系统可以通过大规模的任务进一步提高频谱效率。但是,随着更具动态的设备在线带来了无法控制的随机渠道环境,甚至需要在MEC系统中部署吸引人的技术,即智能反射表面(IRS),以灵活调整通信环境并提高系统能源效率。在本文中,我们研究了IRS辅助NOMA MEC系统的联合卸载,通信和计算资源分配。首先,我们与系统队列稳定性约束一起制定混合整数能量效率最大化问题。然后,我们提出基于基于集中式增强学习(RL)框架的基于Lyapunov功能的混合整数深层确定性策略梯度(LMIDDPG)算法。具体而言,我们设计了混合整数动作空间映射,其中包含连续映射和整数映射。此外,奖励函数被定义为Lyapunov漂移加人类函数的上限。为了使最终设备(EDS)能够在执行阶段独立选择操作,我们进一步提出了基于分布式RL框架的异构多代理LMIDDPG(HMA-LMIDDPG)算法,该算法基于分布式RL框架,具有均匀的ED和异质基础站(BS),作为异型多型。数值结果表明,我们提出的算法可以在保持队列稳定性的同时,在基准算法上实现卓越的能效性能。特别是,分布式结构HMA-LMIDDPG比集中式结构LMIDDPG可以获得更多的能效增益。

By employing powerful edge servers for data processing, mobile edge computing (MEC) has been recognized as a promising technology to support emerging computation-intensive applications. Besides, non-orthogonal multiple access (NOMA)-aided MEC system can further enhance the spectral-efficiency with massive tasks offloading. However, with more dynamic devices brought online and the uncontrollable stochastic channel environment, it is even desirable to deploy appealing technique, i.e., intelligent reflecting surfaces (IRS), in the MEC system to flexibly tune the communication environment and improve the system energy efficiency. In this paper, we investigate the joint offloading, communication and computation resource allocation for IRS-assisted NOMA MEC system. We firstly formulate a mixed integer energy efficiency maximization problem with system queue stability constraint. We then propose the Lyapunov-function-based Mixed Integer Deep Deterministic Policy Gradient (LMIDDPG) algorithm which is based on the centralized reinforcement learning (RL) framework. To be specific, we design the mixed integer action space mapping which contains both continuous mapping and integer mapping. Moreover, the award function is defined as the upper-bound of the Lyapunov drift-plus-penalty function. To enable end devices (EDs) to choose actions independently at the execution stage, we further propose the Heterogeneous Multi-agent LMIDDPG (HMA-LMIDDPG) algorithm based on distributed RL framework with homogeneous EDs and heterogeneous base station (BS) as heterogeneous multi-agent. Numerical results show that our proposed algorithms can achieve superior energy efficiency performance to the benchmark algorithms while maintaining the queue stability. Specially, the distributed structure HMA-LMIDDPG can acquire more energy efficiency gain than centralized structure LMIDDPG.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源