论文标题
野外物体的几个射击对象检测和观点估计
Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild
论文作者
论文摘要
检测对象并在图像中估算其观点是3D场景理解的关键任务。最近的方法在非常大的基准测试基准上取得了出色的结果,以进行对象检测和观点估计。但是,对于很少的样本,表演仍落后于新的对象类别。在本文中,我们解决了很少的对象检测和几乎没有射击观点估计的问题。我们在这两个任务上都以不同方式从数据中提取的类代表性特征来指导网络预测的好处:用于对象检测的图像补丁以及对准3D模型进行观点估计。尽管它很简单,但我们的方法在一系列数据集上的优势超过了最先进的方法,包括pascal和Coco,用于少量对象检测,而Pascal3D+和ObjectNet3D进行了几次估算。此外,当3D模型不可用时,我们通过利用几何相似性和跨不同类别的一致姿势标记来引入一个简单的类别 - 无义观点估计方法。尽管它适度地降低了性能,但此方法仍然比此设置中的以前方法获得更好的结果。最后,我们第一次在三个具有挑战性的基准上解决了这两个少数任务的组合,以在野外,objectNet3d,pascal3d+和pix3d中进行观点估计,显示出非常有希望的结果。
Detecting objects and estimating their viewpoints in images are key tasks of 3D scene understanding. Recent approaches have achieved excellent results on very large benchmarks for object detection and viewpoint estimation. However, performances are still lagging behind for novel object categories with few samples. In this paper, we tackle the problems of few-shot object detection and few-shot viewpoint estimation. We demonstrate on both tasks the benefits of guiding the network prediction with class-representative features extracted from data in different modalities: image patches for object detection, and aligned 3D models for viewpoint estimation. Despite its simplicity, our method outperforms state-of-the-art methods by a large margin on a range of datasets, including PASCAL and COCO for few-shot object detection, and Pascal3D+ and ObjectNet3D for few-shot viewpoint estimation. Furthermore, when the 3D model is not available, we introduce a simple category-agnostic viewpoint estimation method by exploiting geometrical similarities and consistent pose labelling across different classes. While it moderately reduces performance, this approach still obtains better results than previous methods in this setting. Last, for the first time, we tackle the combination of both few-shot tasks, on three challenging benchmarks for viewpoint estimation in the wild, ObjectNet3D, Pascal3D+ and Pix3D, showing very promising results.