论文标题
ULSAM:紧凑卷积神经网络的超轻量级子空间注意模块
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks
论文作者
论文摘要
自我发挥机制对远程依赖性建模的能力使其在视觉模型中的部署跃升。与卷积运算符不同,自我发挥作用提供了无限的接收场,并实现了全球依赖性的计算有效建模。但是,现有的最新注意力机制会产生高计算和/或参数开销,因此不适合紧凑型卷积神经网络(CNN)。在这项工作中,我们提出了一个简单而有效的“超轻量级子空间注意机制”(ULSAM),该机制占每个特征图子空间的不同注意图。我们认为,每个特征子空间的单独的注意图都可以实现多尺度和多频特征表示,这对于细粒度的图像分类更为理想。我们的子空间关注方法是与视觉模型中现有的最新注意力机制的正交和互补的。 ULSAM是端到端的训练,可以在现有的紧凑型CNN中部署为插件模块。值得注意的是,我们的工作是使用子空间注意机制提高紧凑型CNN效率的第一次尝试。为了显示ULSAM的功效,我们用Mobilenet-V1和Mobilenet-V2作为ImabeNet-1K和三个细粒图像分类数据集进行了实验。我们在Mobilenet-V2的Flops和参数计数中均降低了约13%,$ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \分别提高了0.27%的Mobilenet-V2和参数计数,而Imagenet-1k和细颗粒图像分类数据集的TOP-1准确性提高了0.27%和1%以上。代码和训练有素的模型可在https://github.com/nandan91/ulsam上找到。
The capability of the self-attention mechanism to model the long-range dependencies has catapulted its deployment in vision models. Unlike convolution operators, self-attention offers infinite receptive field and enables compute-efficient modeling of global dependencies. However, the existing state-of-the-art attention mechanisms incur high compute and/or parameter overheads, and hence unfit for compact convolutional neural networks (CNNs). In this work, we propose a simple yet effective "Ultra-Lightweight Subspace Attention Mechanism" (ULSAM), which infers different attention maps for each feature map subspace. We argue that leaning separate attention maps for each feature subspace enables multi-scale and multi-frequency feature representation, which is more desirable for fine-grained image classification. Our method of subspace attention is orthogonal and complementary to the existing state-of-the-arts attention mechanisms used in vision models. ULSAM is end-to-end trainable and can be deployed as a plug-and-play module in the pre-existing compact CNNs. Notably, our work is the first attempt that uses a subspace attention mechanism to increase the efficiency of compact CNNs. To show the efficacy of ULSAM, we perform experiments with MobileNet-V1 and MobileNet-V2 as backbone architectures on ImageNet-1K and three fine-grained image classification datasets. We achieve $\approx$13% and $\approx$25% reduction in both the FLOPs and parameter counts of MobileNet-V2 with a 0.27% and more than 1% improvement in top-1 accuracy on the ImageNet-1K and fine-grained image classification datasets (respectively). Code and trained models are available at https://github.com/Nandan91/ULSAM.