很少学习的遇到变压器：统一的查询支持变压器，用于几个射击分类

论文标题

很少学习的遇到变压器：统一的查询支持变压器，用于几个射击分类

Few-Shot Learning Meets Transformer: Unified Query-Support Transformers for Few-Shot Classification

论文作者

Wang, Xixi, Wang, Xiao, Jiang, Bo, Luo, Bin

论文摘要

旨在使用非常有限的样本识别看不见的类的几个射击分类吸引了越来越多的关注。通常，它被称为公制学习问题。几乎没有射击分类的核心问题是如何学习（1）支持集和查询集中图像的一致表示以及（2）在支持和查询集之间图像的有效度量学习。在本文中，我们表明，这两个挑战可以通过统一的查询支持变压器（QSFormer）模型同时建模。具体而言，提出的QSFormer涉及全局查询供应样品变压器（SampleFormer）分支和局部补丁变压器（PatchFormer）学习分支。 SampleFormer旨在捕获样品在支持和查询集以进行图像表示方面的依赖性。它采用编码器，解码器和交叉注意力，分别对几个射击分类任务的支持，查询（图像）表示和度量学习进行建模。同样，作为与全球学习分支的补充，我们通过捕获本地图像贴片的长距离依赖性来提取每个图像样本的结构表示。此外，提出了一种新型的跨尺度交互式提取器（Cife）来提取和融合多尺度的CNN特征，作为提议的少量学习方法的有效骨干模块。所有模块都集成到统一的框架中，并以端到端的方式进行了训练。在四个流行数据集上进行的广泛实验证明了所提出的QSFormer的有效性和优越性。

Few-shot classification which aims to recognize unseen classes using very limited samples has attracted more and more attention. Usually, it is formulated as a metric learning problem. The core issue of few-shot classification is how to learn (1) consistent representations for images in both support and query sets and (2) effective metric learning for images between support and query sets. In this paper, we show that the two challenges can be well modeled simultaneously via a unified Query-Support TransFormer (QSFormer) model. To be specific,the proposed QSFormer involves global query-support sample Transformer (sampleFormer) branch and local patch Transformer (patchFormer) learning branch. sampleFormer aims to capture the dependence of samples in support and query sets for image representation. It adopts the Encoder, Decoder and Cross-Attention to respectively model the Support, Query (image) representation and Metric learning for few-shot classification task. Also, as a complementary to global learning branch, we adopt a local patch Transformer to extract structural representation for each image sample by capturing the long-range dependence of local image patches. In addition, a novel Cross-scale Interactive Feature Extractor (CIFE) is proposed to extract and fuse multi-scale CNN features as an effective backbone module for the proposed few-shot learning method. All modules are integrated into a unified framework and trained in an end-to-end manner. Extensive experiments on four popular datasets demonstrate the effectiveness and superiority of the proposed QSFormer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题