从像素到补丁：合成零击语义分段的上下文感知功能

论文标题

从像素到补丁：合成零击语义分段的上下文感知功能

From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation

论文作者

Gu, Zhangxuan, Zhou, Siyuan, Niu, Li, Zhao, Zihan, Zhang, Liqing

论文摘要

零射击学习已被积极研究以进行图像分类任务，以减轻注释图像标签的负担。有趣的是，语义细分任务需要更多的劳动密集型像素注释，但是零射击语义细分仅引起了有限的研究兴趣。因此，我们专注于零击语义分割，该分段旨在将仅提供为看不见类别提供的类别级别语义表示的看不见对象。在本文中，我们提出了一个新颖的上下文感知功能生成网络（CAGNET），该网络可以根据类别级别的语义表示和像素范围的上下文信息综合上下文感知的像素视觉特征。合成的功能用于验证分类器以启用分割看不见的对象。此外，我们将像素的特征生成和填充扩展到贴片特征生成和填充，这还考虑了像素间的关系。 Pascal-Voc，Pascal-Context和Coco-stuff的实验结果表明，我们的方法明显优于现有的零击语义分割方法。代码可在https://github.com/bcmi/cagnetv2-zero-sero-semantic-emantic-sementation上找到。

Zero-shot learning has been actively studied for image classification task to relieve the burden of annotating image labels. Interestingly, semantic segmentation task requires more labor-intensive pixel-wise annotation, but zero-shot semantic segmentation has only attracted limited research interest. Thus, we focus on zero-shot semantic segmentation, which aims to segment unseen objects with only category-level semantic representations provided for unseen categories. In this paper, we propose a novel Context-aware feature Generation Network (CaGNet), which can synthesize context-aware pixel-wise visual features for unseen categories based on category-level semantic representations and pixel-wise contextual information. The synthesized features are used to finetune the classifier to enable segmenting unseen objects. Furthermore, we extend pixel-wise feature generation and finetuning to patch-wise feature generation and finetuning, which additionally considers inter-pixel relationship. Experimental results on Pascal-VOC, Pascal-Context, and COCO-stuff show that our method significantly outperforms the existing zero-shot semantic segmentation methods. Code is available at https://github.com/bcmi/CaGNetv2-Zero-Shot-Semantic-Segmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题