深层视觉推理：学会从初始场景图像预测任务和运动计划的动作序列

论文标题

深层视觉推理：学会从初始场景图像预测任务和运动计划的动作序列

Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image

论文作者

Driess, Danny, Ha, Jung-Su, Toussaint, Marc

论文摘要

在本文中，我们提出了一个深度卷积的复发性神经网络，该网络可以从初始场景图像中预测任务和运动计划（TAMP）的动作序列。通过将推理与符号离散级别（例如一阶逻辑）与连续运动计划（例如非线性轨迹优化）相结合，可以形式化典型的tamp问题。由于可能的离散作用序列的强大组合复杂性，必须解决大量优化/运动计划问题以找到解决方案，这限制了这些方法的可伸缩性。为了避免这种组合复杂性，我们开发了一个神经网络，该神经网络基于场景的初始图像，直接预测有希望的离散动作序列，因此理想情况下只能解决一个运动计划问题，以找到解决整体型号问题的解决方案。一个关键的方面是，我们的方法概括了许多对象数量和不同对象的场景，尽管一次只对两个对象进行了训练。通过将场景的对象编码为神经网络的输入，而不是固定的特征向量，则可以通过编码场景的对象。结果显示了几个幅度的运行时间改进。视频：https：//youtu.be/i8yyebbvoek

In this paper, we propose a deep convolutional recurrent neural network that predicts action sequences for task and motion planning (TAMP) from an initial scene image. Typical TAMP problems are formalized by combining reasoning on a symbolic, discrete level (e.g. first-order logic) with continuous motion planning such as nonlinear trajectory optimization. Due to the great combinatorial complexity of possible discrete action sequences, a large number of optimization/motion planning problems have to be solved to find a solution, which limits the scalability of these approaches. To circumvent this combinatorial complexity, we develop a neural network which, based on an initial image of the scene, directly predicts promising discrete action sequences such that ideally only one motion planning problem has to be solved to find a solution to the overall TAMP problem. A key aspect is that our method generalizes to scenes with many and varying number of objects, although being trained on only two objects at a time. This is possible by encoding the objects of the scene in images as input to the neural network, instead of a fixed feature vector. Results show runtime improvements of several magnitudes. Video: https://youtu.be/i8yyEbbvoEk

下载PDF全文

下载文献需遵守相关版权规定

论文标题