论文标题
最佳和健壮的类别级别感知:2D和3D语义关键的对象姿势和形状估计
Optimal and Robust Category-level Perception: Object Pose and Shape Estimation from 2D and 3D Semantic Keypoints
论文作者
论文摘要
我们考虑一个类别级别的感知问题,其中给定的2D或3D传感器数据描绘了给定类别的对象(例如汽车),并且必须重建尽管类内部变异性(即不同的汽车模型具有不同的形状),但必须重建对象的3D姿势和形状。我们考虑了一个主动形状模型,其中 - 对于对象类别 - 我们获得了一个潜在的CAD模型库,描述该类别中的对象,并且我们采用了标准公式,其中姿势和形状是通过非convex优化从2D或3D关键点估算的。我们的第一个贡献是开发PACE3D*和PACE2D*,这是第一个使用3D和2D关键点进行姿势和形状估算的最佳最佳求解器。这两个求解器都依赖于紧密(即精确)半决赛松弛的设计。我们的第二个贡献是开发两个求解器的异常型版本,命名为PACE3D#和PACE2D#。为了实现这一目标,我们提出了Robin,Robin是一种一般的图理论框架来修剪异常值,该框架使用兼容性超图来建模测量的兼容性。我们表明,在类别级别的感知问题中,这些超图可以是通过关键点(以2D)的绕组顺序或其凸壳(以3D为单位)构建的,并且可以通过最大的超级计算来滤除许多异常值。最后的贡献是广泛的实验评估。除了在模拟数据集和Pascal3D+数据集上提供消融研究,我们还将求解器与深关键探测器相结合,并表明PACE3D#在Apolloscape数据集中的车辆姿势估计中改进了最新技术,并且其运行时的运行时间与实用应用兼容。我们在https://github.com/mit-spark/pace上发布代码。
We consider a category-level perception problem, where one is given 2D or 3D sensor data picturing an object of a given category (e.g., a car), and has to reconstruct the 3D pose and shape of the object despite intra-class variability (i.e., different car models have different shapes). We consider an active shape model, where -- for an object category -- we are given a library of potential CAD models describing objects in that category, and we adopt a standard formulation where pose and shape are estimated from 2D or 3D keypoints via non-convex optimization. Our first contribution is to develop PACE3D* and PACE2D*, the first certifiably optimal solvers for pose and shape estimation using 3D and 2D keypoints, respectively. Both solvers rely on the design of tight (i.e., exact) semidefinite relaxations. Our second contribution is to develop outlier-robust versions of both solvers, named PACE3D# and PACE2D#. Towards this goal, we propose ROBIN, a general graph-theoretic framework to prune outliers, which uses compatibility hypergraphs to model measurements' compatibility. We show that in category-level perception problems these hypergraphs can be built from the winding orders of the keypoints (in 2D) or their convex hulls (in 3D), and many outliers can be filtered out via maximum hyperclique computation. The last contribution is an extensive experimental evaluation. Besides providing an ablation study on simulated datasets and on the PASCAL3D+ dataset, we combine our solver with a deep keypoint detector, and show that PACE3D# improves over the state of the art in vehicle pose estimation in the ApolloScape datasets, and its runtime is compatible with practical applications. We release our code at https://github.com/MIT-SPARK/PACE.