PARCNETV2：超大的内核，引起了人们的注意

论文标题

PARCNETV2：超大的内核，引起了人们的注意

ParCNetV2: Oversized Kernel with Enhanced Attention

论文作者

Xu, Ruihan, Zhang, Haokui, Hu, Wenze, Zhang, Shiliang, Wang, Xiaoyu

论文摘要

变压器在各种计算机视觉任务中表现出巨大的潜力。通过从变形金刚借用设计概念，许多研究彻底改变了CNN，并显示出了显着的结果。本文属于这一研究。具体而言，我们提出了一个新的卷积神经网络PARCNETV2，该网络扩展了具有超大卷积和分叉栅极单位的位置感知的循环卷积（PARCNET），以增强注意力。超大的卷积采用了一支具有输入大小的两倍的内核，通过全球接收场对远程依赖性进行了模型。同时，它通过从卷积内核中删除移位不变的属性来实现隐式位置编码，即当内核大小是输入尺寸的两倍时，不同空间位置的有效核是不同的。分叉栅极单位实现了类似于变形金刚自我注意的注意机制。它通过两个分支的元素乘法应用，一个用作特征转换，而另一个则用作注意力权重。此外，我们引入了一个统一的本地全球卷积块，以统一早期和晚期卷积块的设计。广泛的实验证明了我们方法比结合CNN和变压器的其他卷积神经网络和混合模型的优越性。代码将发布。

Transformers have shown great potential in various computer vision tasks. By borrowing design concepts from transformers, many studies revolutionized CNNs and showed remarkable results. This paper falls in this line of studies. Specifically, we propose a new convolutional neural network, ParCNetV2, that extends position-aware circular convolution (ParCNet) with oversized convolutions and bifurcate gate units to enhance attention. The oversized convolution employs a kernel with twice the input size to model long-range dependencies through a global receptive field. Simultaneously, it achieves implicit positional encoding by removing the shift-invariant property from convolution kernels, i.e., the effective kernels at different spatial locations are different when the kernel size is twice as large as the input size. The bifurcate gate unit implements an attention mechanism similar to self-attention in transformers. It is applied through element-wise multiplication of the two branches, one serves as feature transformation while the other serves as attention weights. Additionally, we introduce a uniform local-global convolution block to unify the design of the early and late stage convolution blocks. Extensive experiments demonstrate the superiority of our method over other convolutional neural networks and hybrid models that combine CNNs and transformers. Code will be released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题