深入加强学习方法，用于MIMO预编码问题：最佳性和鲁棒性

论文标题

深入加强学习方法，用于MIMO预编码问题：最佳性和鲁棒性

Deep reinforcement learning approach to MIMO precoding problem: Optimality and Robustness

论文作者

Lee, Heunchul, Girnyk, Maksym, Jeong, Jaeseong

论文摘要

在本文中，我们提出了一个基于深入的加固学习（RL）的预编码框架，该框架可用于学习复杂的多输入多输出（MIMO）预编码问题的最佳预编码策略。我们将单用户MIMO系统的预编码问题建模为RL问题，其中学习代理会根据有关环境条件的上下文信息依次选择先编码器以服务于MIMO系统的环境，同时根据环境中的奖励反馈来调整预编码器选择策略，以最大程度地提高数字奖励信号。我们使用两个规范的深度RL（DRL）算法开发RL代理，即深Q-Network（DQN）和深层确定性策略梯度（DDPG）。为了证明拟议的基于DRL的预编码框架的最佳性，我们明确考虑了一个简单的MIMO环境，可以通过分析获得最佳解决方案，并表明基于DQN-和DDPG的代理可以学习近乎最佳的策略，以绘制MIMO系统的环境，从而最大程度地绘制基于奖励的系统，该系统是基于奖励功能的，并不是基于代码。此外，为了研究基于DRL的预编码框架的鲁棒性，我们研究了在复杂的MIMO环境中两种DRL算法的性能，为此，最佳解决方案尚不清楚。数值结果证实了基于DRL的预编码框架的有效性，并表明基于DRL的框架可以胜过复杂的MIMO环境中常规近似算法。

In this paper, we propose a deep reinforcement learning (RL)-based precoding framework that can be used to learn an optimal precoding policy for complex multiple-input multiple-output (MIMO) precoding problems. We model the precoding problem for a single-user MIMO system as an RL problem in which a learning agent sequentially selects the precoders to serve the environment of MIMO system based on contextual information about the environmental conditions, while simultaneously adapting the precoder selection policy based on the reward feedback from the environment to maximize a numerical reward signal. We develop the RL agent with two canonical deep RL (DRL) algorithms, namely deep Q-network (DQN) and deep deterministic policy gradient (DDPG). To demonstrate the optimality of the proposed DRL-based precoding framework, we explicitly consider a simple MIMO environment for which the optimal solution can be obtained analytically and show that DQN- and DDPG-based agents can learn the near-optimal policy to map the environment state of MIMO system to a precoder that maximizes the reward function, respectively, in the codebook-based and non-codebook based MIMO precoding systems. Furthermore, to investigate the robustness of DRL-based precoding framework, we examine the performance of the two DRL algorithms in a complex MIMO environment, for which the optimal solution is not known. The numerical results confirm the effectiveness of the DRL-based precoding framework and show that the proposed DRL-based framework can outperform the conventional approximation algorithm in the complex MIMO environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题