从多个未校准的摄像机中学习精确的3D操纵

论文标题

从多个未校准的摄像机中学习精确的3D操纵

Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras

论文作者

Akinola, Iretiayo, Varley, Jacob, Kalashnikov, Dmitry

论文摘要

在这项工作中，我们提出了一种有效的多视图方法，用于闭环端到端学习本质上是3D的精确操纵任务。我们的方法学会了使用多个静态放置但未校准的RGB相机视图完成这些任务，而无需构建显式3D表示，例如PointCloud或Voxel Grid。与单视基线相比，这种多摄像机方法在困难的堆叠和插入任务上实现了卓越的任务性能。单视“机器人代理”挣扎于遮挡和挑战，以估计兴趣点之间的相对姿势。虽然可以从多个深度传感器的注册输出中获得完整的3D场景表示（Voxels或PointClouds），但在此类显式3D表示方面进行操作复杂。这些挑战包括与2D图像相比，由于反射表面等物体特性（例如反射表面）引起的不完善的相机校准，诸如反射表面等物体特性（例如反射表面）以及较慢的推理速度引起的深度图。我们使用静态但未校准的摄像机不需要摄像头或摄像机摄像机校准，从而使所提出的方法易于设置，并且我们在训练过程中使用\ textIt {传感器辍学}，使其在部署后的摄像机视图损失有弹性。

In this work, we present an effective multi-view approach to closed-loop end-to-end learning of precise manipulation tasks that are 3D in nature. Our method learns to accomplish these tasks using multiple statically placed but uncalibrated RGB camera views without building an explicit 3D representation such as a pointcloud or voxel grid. This multi-camera approach achieves superior task performance on difficult stacking and insertion tasks compared to single-view baselines. Single view robotic agents struggle from occlusion and challenges in estimating relative poses between points of interest. While full 3D scene representations (voxels or pointclouds) are obtainable from registered output of multiple depth sensors, several challenges complicate operating off such explicit 3D representations. These challenges include imperfect camera calibration, poor depth maps due to object properties such as reflective surfaces, and slower inference speeds over 3D representations compared to 2D images. Our use of static but uncalibrated cameras does not require camera-robot or camera-camera calibration making the proposed approach easy to setup and our use of \textit{sensor dropout} during training makes it resilient to the loss of camera-views after deployment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题