诗人：单视图的姿势估计变压器多对象6D姿势估计

论文标题

诗人：单视图的姿势估计变压器多对象6D姿势估计

PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation

论文作者

Jantos, Thomas, Hamdad, Mohamed Amin, Granig, Wolfgang, Weiss, Stephan, Steinbrener, Jan

论文摘要

准确的6D对象姿势估计是多种机器人应用（例如抓地力或定位）的重要任务。由于对象对称性，混乱和遮挡，这是一项具有挑战性的任务，但是当未提供其他信息（例如深度和3D模型）时，它变得更具挑战性。我们提出了一种基于变压器的方法，该方法将RGB图像作为输入，并预测图像中每个对象的6D姿势。除了图像外，我们的网络不需要任何其他信息，例如深度图或3D对象模型。首先，图像通过对象检测器通过以生成特征图并检测对象。然后，将特征地图馈入变压器中，并以检测到的边界框作为其他信息。之后，输出对象查询通过单独的翻译和旋转头处理。我们在具有挑战性的YCB-V数据集上实现了仅RGB的方法的最新结果。我们说明了最终模型作为6-DOF状态估计任务的姿势传感器的适用性。代码可在https://github.com/aau-cns/poet上找到。

Accurate 6D object pose estimation is an important task for a variety of robotic applications such as grasping or localization. It is a challenging task due to object symmetries, clutter and occlusion, but it becomes more challenging when additional information, such as depth and 3D models, is not provided. We present a transformer-based approach that takes an RGB image as input and predicts a 6D pose for each object in the image. Besides the image, our network does not require any additional information such as depth maps or 3D object models. First, the image is passed through an object detector to generate feature maps and to detect objects. Then, the feature maps are fed into a transformer with the detected bounding boxes as additional information. Afterwards, the output object queries are processed by a separate translation and rotation head. We achieve state-of-the-art results for RGB-only approaches on the challenging YCB-V dataset. We illustrate the suitability of the resulting model as pose sensor for a 6-DoF state estimation task. Code is available at https://github.com/aau-cns/poet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题