视频对象检测的对象感知功能聚合

论文标题

视频对象检测的对象感知功能聚合

Object-aware Feature Aggregation for Video Object Detection

论文作者

Geng, Qichuan, Zhang, Hong, Jiang, Na, Qi, Xiaojuan, Zhang, Liangjun, Zhou, Zhong

论文摘要

我们提出了一个用于视频对象检测（VID）的对象感知功能聚合（OFA）模块。我们的方法是由令人着迷的属性激发的，即视频级别的对象感知知识可以用作有力的语义，然后才能帮助对象识别。结果，具有此类先验知识的增强功能可以有效地改善分类和本地化性能。为了使功能访问有关整个视频的更多内容，我们首先捕获了对建议的知识，并将这些知识与成熟的配对环境结合在一起。通过对Imagenet VID数据集进行广泛的实验结果，我们的方法分别以83.93％和86.09％的映射在Resnet-101和Resnet-101和Resnext-101中表明了对象感知知识的有效性。当进一步配备序列DIOU NMS时，我们在提交的纸张上获得了85.07％和86.88％的报告地图。接受我们的结果的代码将在接受后发布。

We present an Object-aware Feature Aggregation (OFA) module for video object detection (VID). Our approach is motivated by the intriguing property that video-level object-aware knowledge can be employed as a powerful semantic prior to help object recognition. As a consequence, augmenting features with such prior knowledge can effectively improve the classification and localization performance. To make features get access to more content about the whole video, we first capture the object-aware knowledge of proposals and incorporate such knowledge with the well-established pair-wise contexts. With extensive experimental results on the ImageNet VID dataset, our approach demonstrates the effectiveness of object-aware knowledge with the superior performance of 83.93% and 86.09% mAP with ResNet-101 and ResNeXt-101, respectively. When further equipped with Sequence DIoU NMS, we obtain the best-reported mAP of 85.07% and 86.88% upon the paper submitted. The code to reproduce our results will be released after acceptance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题