在线3D垃圾箱包装有限的深度加固学习

论文标题

在线3D垃圾箱包装有限的深度加固学习

Online 3D Bin Packing with Constrained Deep Reinforcement Learning

论文作者

Zhao, Hang, She, Qijin, Zhu, Chenyang, Yang, Yin, Xu, Kai

论文摘要

我们解决了一个具有挑战性但实际上有用的3D垃圾箱包装问题（3D-BPP）的变体。在我们的问题中，代理商对要包装到垃圾箱中的物品的信息有限，并且必须在到达后立即包装物品，而无需缓冲或重新调整。该项目的位置还受到避免碰撞和身体稳定性的限制。我们将此在线3D-BPP作为约束的马尔可夫决策过程。为了解决该问题，我们提出了一种有效且易于实施的约束深度加固学习（DRL）方法，在参与者批评框架下。特别是，我们引入了一个可行性预测指标，以预测放置动作的可行性掩模，并使用它来调节训练过程中演员的动作概率输出。这样的监督和转变以DRL促进代理人有效地学习可行的政策。我们的方法也可以被广义化，例如，具有不同方向的lookahead或项目的能力。我们进行了广泛的评估表明，学到的政策极大地超过了最新方法。一项用户研究表明，我们的方法达到了人类水平的性能。

We solve a challenging yet practically useful variant of 3D Bin Packing Problem (3D-BPP). In our problem, the agent has limited information about the items to be packed into the bin, and an item must be packed immediately after its arrival without buffering or readjusting. The item's placement also subjects to the constraints of collision avoidance and physical stability. We formulate this online 3D-BPP as a constrained Markov decision process. To solve the problem, we propose an effective and easy-to-implement constrained deep reinforcement learning (DRL) method under the actor-critic framework. In particular, we introduce a feasibility predictor to predict the feasibility mask for the placement actions and use it to modulate the action probabilities output by the actor during training. Such supervisions and transformations to DRL facilitate the agent to learn feasible policies efficiently. Our method can also be generalized e.g., with the ability to handle lookahead or items with different orientations. We have conducted extensive evaluation showing that the learned policy significantly outperforms the state-of-the-art methods. A user study suggests that our method attains a human-level performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题