论文标题

可认证的3D对象姿势估计:基础,学习模型和自我训练

Certifiable 3D Object Pose Estimation: Foundations, Learning Models, and Self-Training

论文作者

Talak, Rajat, Peng, Lisa, Carlone, Luca

论文摘要

我们考虑了一个可认证的对象姿势估计问题,其中 - 给定对象的部分点云 - 目标不仅要估计对象姿势,还要为所得估算值提供正确性证书。我们的第一个贡献是端到端感知模型认证的一般理论。特别是,我们介绍了$ζ$ - 纠正性的概念,该概念界定了估计与地面真理之间的距离。我们表明,可以通过实现两个证书来评估$ζ$ - 权利:(i)可观察到的正确性证书,该证书声称模型输出是否与输入数据和先前信息一致,(ii)非分类证书,该证书声称输入数据是否足以计算独特的估计值。我们的第二个贡献是应用这一理论并设计新的基于学习的可认证姿势估计器。我们提出了C-3PO,这是一种基于语义的姿势估计模型,并用两个证书增强,以解决可认证的姿势估计问题。 C-3PO还包括一个按键校正器,该校正器以可区分优化层的形式实现,该层可以纠正大型检测错误(例如,由于SIM到实现了差距)。我们的第三个贡献是一种新型的自我监督训练方法,它使用我们可观察到的正确性证书在培训期间向C-3PO提供监督信号。在其中,在每个训练迭代中,模型仅在可观察到正确的输入输出对上进行训练。随着培训的进行,我们看到可观察到的正确输入对增长,最终在许多情况下达到了近100%。我们的实验表明,(i)基于标准的语义 - 按钮方法的表现优于最新选择,(ii)C-3PO进一步提高了性能并显着优于所有基础线,并且(iii)C-3PO的证书能够辨别正确的姿势估计。

We consider a certifiable object pose estimation problem, where -- given a partial point cloud of an object -- the goal is to not only estimate the object pose, but also to provide a certificate of correctness for the resulting estimate. Our first contribution is a general theory of certification for end-to-end perception models. In particular, we introduce the notion of $ζ$-correctness, which bounds the distance between an estimate and the ground truth. We show that $ζ$-correctness can be assessed by implementing two certificates: (i) a certificate of observable correctness, that asserts if the model output is consistent with the input data and prior information, (ii) a certificate of non-degeneracy, that asserts whether the input data is sufficient to compute a unique estimate. Our second contribution is to apply this theory and design a new learning-based certifiable pose estimator. We propose C-3PO, a semantic-keypoint-based pose estimation model, augmented with the two certificates, to solve the certifiable pose estimation problem. C-3PO also includes a keypoint corrector, implemented as a differentiable optimization layer, that can correct large detection errors (e.g. due to the sim-to-real gap). Our third contribution is a novel self-supervised training approach that uses our certificate of observable correctness to provide the supervisory signal to C-3PO during training. In it, the model trains only on the observably correct input-output pairs, in each training iteration. As training progresses, we see that the observably correct input-output pairs grow, eventually reaching near 100% in many cases. Our experiments show that (i) standard semantic-keypoint-based methods outperform more recent alternatives, (ii) C-3PO further improves performance and significantly outperforms all the baselines, and (iii) C-3PO's certificates are able to discern correct pose estimates.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源