卫星频率计划设计的深入强化学习的适用性和挑战

论文标题

卫星频率计划设计的深入强化学习的适用性和挑战

Applicability and Challenges of Deep Reinforcement Learning for Satellite Frequency Plan Design

论文作者

Luis, Juan Jose Garau, Crawley, Edward, Cameron, Bruce

论文摘要

深入增强学习（DRL）模型的研究和基准测试已成为许多行业（包括航空航天工程和通信）的趋势。这些领域的最新研究提出了这些模型，以解决某些复杂的实时决策问题，其中经典方法不符合时间要求或无法获得最佳解决方案。尽管已经证明了DRL模型的良好性能在特定的用例或场景中，但大多数研究并未讨论这种模型在实际操作过程中的折衷和普遍性。在本文中，我们探讨了DRL模型不同元素的权衡以及它们如何影响最终性能。为此，我们选择在多层卫星星座的上下文中选择频率计划设计（FPD）问题作为我们的用例，并提出了DRL模型来解决它。我们确定了6种不同的核心要素，这些要素在其绩效中具有重大影响：政策，政策优化器，国家，行动和奖励表示以及培训环境。我们为每个元素中的每个元素分析了不同的替代方案，并表征了它们的效果。我们还使用多种环境来说明我们改变维度或使环境非平稳性的不同情况。我们的发现表明，DRL是解决实际操作中FPD问题的潜在方法，尤其是因为其决策速度。但是，在所有情况下，没有单个DRL模型能够胜过其余部分，而6个核心元素中的每一个的最佳方法都取决于操作环境的特征。尽管我们同意DRL解决航空航天行业未来复杂问题的潜力，但我们也反思了设计适当的模型和培训程序，了解此类模型的适用性以及报告主要绩效折衷的重要性。

The study and benchmarking of Deep Reinforcement Learning (DRL) models has become a trend in many industries, including aerospace engineering and communications. Recent studies in these fields propose these kinds of models to address certain complex real-time decision-making problems in which classic approaches do not meet time requirements or fail to obtain optimal solutions. While the good performance of DRL models has been proved for specific use cases or scenarios, most studies do not discuss the compromises and generalizability of such models during real operations. In this paper we explore the tradeoffs of different elements of DRL models and how they might impact the final performance. To that end, we choose the Frequency Plan Design (FPD) problem in the context of multibeam satellite constellations as our use case and propose a DRL model to address it. We identify 6 different core elements that have a major effect in its performance: the policy, the policy optimizer, the state, action, and reward representations, and the training environment. We analyze different alternatives for each of these elements and characterize their effect. We also use multiple environments to account for different scenarios in which we vary the dimensionality or make the environment nonstationary. Our findings show that DRL is a potential method to address the FPD problem in real operations, especially because of its speed in decision-making. However, no single DRL model is able to outperform the rest in all scenarios, and the best approach for each of the 6 core elements depends on the features of the operation environment. While we agree on the potential of DRL to solve future complex problems in the aerospace industry, we also reflect on the importance of designing appropriate models and training procedures, understanding the applicability of such models, and reporting the main performance tradeoffs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题