我喜欢将其移动：6D姿势估计作为行动决策过程

论文标题

我喜欢将其移动：6D姿势估计作为行动决策过程

I Like to Move It: 6D Pose Estimation as an Action Decision Process

论文作者

Busam, Benjamin, Jung, Hyun Jun, Navab, Nassir

论文摘要

对象姿势估计是机器人视觉和AR的组成部分。先前的6D姿势检索管道将问题视为回归任务，或者将姿势空间离散为分类。我们更改此范式并将问题重新制定为行动决策过程，在该过程中，以增量离散步骤更新初始姿势，从而将虚拟3D呈现向正确的解决方案移动。神经网络估计可能从单个RGB图像迭代地移动，并决定了可接受的最终姿势。与训练特定于对象的姿势模型的其他方法相比，我们学习了决策过程。这允许轻巧的体系结构自然概括为看不见的对象。如果视频序列中存在微不足道的变化，则可以进行过程终止的连贯停止操作可以动态降低计算成本。而不是静态推理时间，我们因此根据对象运动自动增加了运行时。在Laval和YCB视频场景上评估了我们的行动决策网络的鲁棒性和准确性，在那里我们可以显着改善最新的技术。

Object pose estimation is an integral part of robot vision and AR. Previous 6D pose retrieval pipelines treat the problem either as a regression task or discretize the pose space to classify. We change this paradigm and reformulate the problem as an action decision process where an initial pose is updated in incremental discrete steps that sequentially move a virtual 3D rendering towards the correct solution. A neural network estimates likely moves from a single RGB image iteratively and determines so an acceptable final pose. In comparison to other approaches that train object-specific pose models, we learn a decision process. This allows for a lightweight architecture while it naturally generalizes to unseen objects. A coherent stop action for process termination enables dynamic reduction of the computation cost if there are insignificant changes in a video sequence. Instead of a static inference time, we thereby automatically increase the runtime depending on the object motion. Robustness and accuracy of our action decision network are evaluated on Laval and YCB video scenes where we significantly improve the state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题