开放式vocabulary属性检测

论文标题

开放式vocabulary属性检测

Open-vocabulary Attribute Detection

论文作者

Bravo, María A., Mittal, Sudhanshu, Ging, Simon, Brox, Thomas

论文摘要

视觉模型已启用了开放式摄影任务，在这些任务中，可以使用任何文本提示以零拍的方式查询预测。现有的开放式摄取任务集中在对象类上，而对对象属性的研究由于缺乏以可靠的属性为中心的评估基准而受到限制。本文介绍了开放式vocabulary属性检测（OVAD）任务和相应的OVAD基准。新任务和基准的目的是探测通过视觉模型学习的对象级属性信息。为此，我们创建了一个干净且注释的测试集，涵盖了MS Coco的80个对象类上的117个属性类。它包括正面和阴性注释，可以进行开放式评估。总体而言，基准由140万个注释组成。为了参考，我们为开放式vocabulary属性检测提供了第一种基线方法。此外，我们通过研究几种基础模型的属性检测性能来证明基准的价值。项目页面https://ovad-benchmark.github.io

Vision-language modeling has enabled open-vocabulary tasks where predictions can be queried using any text prompt in a zero-shot manner. Existing open-vocabulary tasks focus on object classes, whereas research on object attributes is limited due to the lack of a reliable attribute-focused evaluation benchmark. This paper introduces the Open-Vocabulary Attribute Detection (OVAD) task and the corresponding OVAD benchmark. The objective of the novel task and benchmark is to probe object-level attribute information learned by vision-language models. To this end, we created a clean and densely annotated test set covering 117 attribute classes on the 80 object classes of MS COCO. It includes positive and negative annotations, which enables open-vocabulary evaluation. Overall, the benchmark consists of 1.4 million annotations. For reference, we provide a first baseline method for open-vocabulary attribute detection. Moreover, we demonstrate the benchmark's value by studying the attribute detection performance of several foundation models. Project page https://ovad-benchmark.github.io

下载PDF全文

下载文献需遵守相关版权规定

论文标题