限制了类无形的弱监督物体本地化的采样

论文标题

限制了类无形的弱监督物体本地化的采样

Constrained Sampling for Class-Agnostic Weakly Supervised Object Localization

论文作者

Murtaza, Shakeeb, Belharbi, Soufiane, Pedersoli, Marco, Sarraf, Aydin, Granger, Eric

论文摘要

自我监督的视觉变压器可以在图像中生成对象的准确定位图。但是，由于它们将场景分解为包含各种对象的多个地图，并且不依赖任何明确的监督信号，因此他们无法按照弱监督的对象定位（WSOL）的要求将感兴趣的对象与其他对象区分开。为了解决这个问题，我们建议利用不同的变压器头部生成的多个地图以获取伪标记以训练WSOL模型。特别是，引入了一种新的歧视性提案抽样方法，该方法依赖于验证的CNN分类器来识别区分区域。然后，从这些区域采样前景和背景像素，以训练WSOL模型，以生成可以准确定位属于特定类的对象的激活图。关于挑战性的CUB基准数据集的经验结果表明，我们提出的方法可以超过范围内阈值范围的最新方法。我们的方法提供了类激活图，并更好地覆盖了前景对象区域W.R.T.背景。

Self-supervised vision transformers can generate accurate localization maps of the objects in an image. However, since they decompose the scene into multiple maps containing various objects, and they do not rely on any explicit supervisory signal, they cannot distinguish between the object of interest from other objects, as required in weakly-supervised object localization (WSOL). To address this issue, we propose leveraging the multiple maps generated by the different transformer heads to acquire pseudo-labels for training a WSOL model. In particular, a new discriminative proposals sampling method is introduced that relies on a pretrained CNN classifier to identify discriminative regions. Then, foreground and background pixels are sampled from these regions in order to train a WSOL model for generating activation maps that can accurately localize objects belonging to a specific class. Empirical results on the challenging CUB benchmark dataset indicate that our proposed approach can outperform state-of-art methods over a wide range of threshold values. Our method provides class activation maps with a better coverage of foreground object regions w.r.t. the background.

下载PDF全文

下载文献需遵守相关版权规定

论文标题