论文标题

语义对准匹配,以增强DITR收敛和多尺度特征融合

Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion

论文作者

Zhang, Gongjie, Luo, Zhipeng, Huang, Jiaxing, Lu, Shijian, Xing, Eric P.

论文摘要

最近提出的检测变压器(DETR)已建立了一个完全端到端的范式来检测对象检测。但是,DETR遭受慢训练的融合,这阻碍了其对各种检测任务的适用性。我们观察到,由于对象查询和编码图像特征之间的语义不一致,DETR的缓慢收敛在很大程度上归因于将对象查询与相关区域匹配的困难。通过此观察,我们设计了语义对齐的匹配DETR ++(SAM-DERT ++),以加速DETR的收敛并改善检测性能。 SAM-DETR ++的核心是一个插件模块,该模块将对象查询和编码图像功能投射到同一功能嵌入空间中,在该空间中,每个对象查询都可以轻松地与具有相似语义的相关区域匹配。此外,SAM-DETR ++搜索多个代表性关键点,并利用其功能以增强的表示能力来进行语义对齐的匹配。此外,SAM-DETR ++可以根据设计的语义对齐的匹配有效地以粗略的方式融合多尺度特征。广泛的实验表明,提出的SAM-DERT ++实现了优越的收敛速度和竞争性检测准确性。此外,作为一种插件方法,SAM-DETR ++可以以更高的性能来补充现有的DETR收敛解决方案,仅使用12个训练时代获得44.8%的AP,而49.1%的AP则使用Resnet-50的Coco Val2017上的50个训练时期的AP和50个培训时期。代码可在https://github.com/zhanggongjie/sam-detr上找到。

The recently proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection. However, DETR suffers from slow training convergence, which hinders its applicability to various detection tasks. We observe that DETR's slow convergence is largely attributed to the difficulty in matching object queries to relevant regions due to the unaligned semantics between object queries and encoded image features. With this observation, we design Semantic-Aligned-Matching DETR++ (SAM-DETR++) to accelerate DETR's convergence and improve detection performance. The core of SAM-DETR++ is a plug-and-play module that projects object queries and encoded image features into the same feature embedding space, where each object query can be easily matched to relevant regions with similar semantics. Besides, SAM-DETR++ searches for multiple representative keypoints and exploits their features for semantic-aligned matching with enhanced representation capacity. Furthermore, SAM-DETR++ can effectively fuse multi-scale features in a coarse-to-fine manner on the basis of the designed semantic-aligned matching. Extensive experiments show that the proposed SAM-DETR++ achieves superior convergence speed and competitive detection accuracy. Additionally, as a plug-and-play method, SAM-DETR++ can complement existing DETR convergence solutions with even better performance, achieving 44.8% AP with merely 12 training epochs and 49.1% AP with 50 training epochs on COCO val2017 with ResNet-50. Codes are available at https://github.com/ZhangGongjie/SAM-DETR .

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源