将视觉乐高手册转换为可确保机器的计划

论文标题

将视觉乐高手册转换为可确保机器的计划

Translating a Visual LEGO Manual to a Machine-Executable Plan

论文作者

Wang, Ruocheng, Zhang, Yunzhi, Mao, Jiayuan, Cheng, Chin-Yi, Wu, Jiajun

论文摘要

我们研究了将人类设计师创建的基于图像的，逐步组装手册转换为机器可解剖说明的问题。我们将此问题提出为顺序预测任务：在每个步骤中，我们的模型都读取手册，将要添加到当前形状中的组件定位，并注入其3D姿势。此任务构成了在手动图像和实际3D对象之间建立2d-3d对应关系的挑战，而3D对象的3D构成了看不见的3D对象，因为要在步骤中添加的新组件可以是从前一个步骤中构建的对象。为了应对这两个挑战，我们提出了一个基于学习的新型框架，即手动到执行计划网络（MEPNET），该网络（MEPNET）从一系列手动图像中重建了组装步骤。关键思想是将神经2D关键点检测模块和2D-3D投影算法进行高精度预测和强有力的概括性，以看不见的组件。在三个新收集的乐高手册数据集和Minecraft House数据集上，MEPNET优于现有方法。

We study the problem of translating an image-based, step-by-step assembly manual created by human designers into machine-interpretable instructions. We formulate this problem as a sequential prediction task: at each step, our model reads the manual, locates the components to be added to the current shape, and infers their 3D poses. This task poses the challenge of establishing a 2D-3D correspondence between the manual image and the real 3D object, and 3D pose estimation for unseen 3D objects, since a new component to be added in a step can be an object built from previous steps. To address these two challenges, we present a novel learning-based framework, the Manual-to-Executable-Plan Network (MEPNet), which reconstructs the assembly steps from a sequence of manual images. The key idea is to integrate neural 2D keypoint detection modules and 2D-3D projection algorithms for high-precision prediction and strong generalization to unseen components. The MEPNet outperforms existing methods on three newly collected LEGO manual datasets and a Minecraft house dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题