通过深入加强学习解决背包问题的状态聚合方法

论文标题

通过深入加强学习解决背包问题的状态聚合方法

A State Aggregation Approach for Solving Knapsack Problem with Deep Reinforcement Learning

论文作者

Afshar, Reza Refaei, Zhang, Yingqian, Firat, Murat, Kaymak, Uzay

论文摘要

本文提出了一种深入的加固学习（DRL）方法来解决背包问题。所提出的方法由基于表格加强学习以提取特征和构建状态的状态聚合步骤组成。状态汇总策略应用于背包问题的每个问题实例，该实例与Advantage Actor评论家（A2C）算法一起使用，以训练在每个时间步骤中依次选择项目的策略。该方法是一种建设性解决方案方法，并且重复选择项目的过程，直到获得最终解决方案为止。实验表明，我们的方法为所有测试实例提供了接近最佳解决方案，优于贪婪算法，并且能够处理更大的实例和比现有的DRL方法更灵活。此外，结果表明，具有状态聚合策略的拟议模型不仅提供了更好的解决方案，而且还提供了比没有状态聚合的模型，在时间段的时间段更少。

This paper proposes a Deep Reinforcement Learning (DRL) approach for solving knapsack problem. The proposed method consists of a state aggregation step based on tabular reinforcement learning to extract features and construct states. The state aggregation policy is applied to each problem instance of the knapsack problem, which is used with Advantage Actor Critic (A2C) algorithm to train a policy through which the items are sequentially selected at each time step. The method is a constructive solution approach and the process of selecting items is repeated until the final solution is obtained. The experiments show that our approach provides close to optimal solutions for all tested instances, outperforms the greedy algorithm, and is able to handle larger instances and more flexible than an existing DRL approach. In addition, the results demonstrate that the proposed model with the state aggregation strategy not only gives better solutions but also learns in less timesteps, than the one without state aggregation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题