图像分类模型对前景，背景和视觉属性的敏感性的全面研究

论文标题

图像分类模型对前景，背景和视觉属性的敏感性的全面研究

A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes

论文作者

Moayeri, Mazda, Pope, Phillip, Balaji, Yogesh, Feizi, Soheil

论文摘要

尽管具有单标签监督的数据集在图像分类方面取得了快速的进步，但为了定量评估模型如何做出预测，需要附加注释。为此，对于ImageNet样品的子集，我们收集整个对象的细分面具，并收集$ 18的信息属性。我们称此数据集登录10（具有本地化的丰富视觉属性），由大约$ 26K $实例组成，超过$ 10 $ $。使用Rival10，我们评估了广泛的模型对前景，背景和属性中噪声损坏的敏感性。在我们的分析中，我们考虑了各种最先进的体系结构（重新连接，变形金刚）和培训程序（剪辑，Simclr，Deit，Deit，对抗性培训）。我们发现，在有点令人惊讶的是，在重新设备中，对抗训练使与前景相比，与标准培训相比，模型对背景更敏感。同样，相反训练的模型在变压器和重新板中的相对前景灵敏度也较低。最后，我们观察到变压器的有趣适应能力随着腐败水平的增加而提高相对前景敏感性。使用显着性方法，我们自动发现伪造的特征，这些特征可以推动模型的背景灵敏度，并评估显着性图与前景的比对。最后，我们通过将特征显着性与语义属性的基础定位进行比较来定量研究神经特征的归因问题。

While datasets with single-label supervision have propelled rapid advances in image classification, additional annotations are necessary in order to quantitatively assess how models make predictions. To this end, for a subset of ImageNet samples, we collect segmentation masks for the entire object and $18$ informative attributes. We call this dataset RIVAL10 (RIch Visual Attributes with Localization), consisting of roughly $26k$ instances over $10$ classes. Using RIVAL10, we evaluate the sensitivity of a broad set of models to noise corruptions in foregrounds, backgrounds and attributes. In our analysis, we consider diverse state-of-the-art architectures (ResNets, Transformers) and training procedures (CLIP, SimCLR, DeiT, Adversarial Training). We find that, somewhat surprisingly, in ResNets, adversarial training makes models more sensitive to the background compared to foreground than standard training. Similarly, contrastively-trained models also have lower relative foreground sensitivity in both transformers and ResNets. Lastly, we observe intriguing adaptive abilities of transformers to increase relative foreground sensitivity as corruption level increases. Using saliency methods, we automatically discover spurious features that drive the background sensitivity of models and assess alignment of saliency maps with foregrounds. Finally, we quantitatively study the attribution problem for neural features by comparing feature saliency with ground-truth localization of semantic attributes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题