轻巧的结构意识到视觉理解的关注

论文标题

轻巧的结构意识到视觉理解的关注

Lightweight Structure-Aware Attention for Visual Understanding

论文作者

Kwon, Heeseung, Castro, Francisco M., Marin-Jimenez, Manuel J., Guil, Nicolas, Alahari, Karteek

论文摘要

注意操作员已被广泛用作视觉理解的基本砖，因为它通过其可调节核提供了一些灵活性。但是，该操作员受到固有的局限性：（1）注意力内核不够歧视，导致了高冗余，并且（2）计算和内存的复杂性在序列长度上是二次的。在本文中，我们提出了一个新颖的注意操作员，称为轻量级结构吸引注意力（LISA），它具有更好的代表能力，具有对数线性复杂性。我们的操作员通过学习结构模式将注意力内核变为更具歧视性。这些结构模式是通过将一组相对位置嵌入（RPE）作为乘法权重编码的，从而提高了注意内核的表示能力。另外，将RPE近似以获得对数线性复杂性。我们的实验和分析表明，所提出的操作员的表现优于自我注意事项和其他现有操作员，从而在Imagenet-1K和其他下游任务上实现了最先进的结果，例如在动力学400上进行视频动作识别，对象检测\＆实例检测Coco，以及对ADE-20k的Senicantic sementation。

Attention operator has been widely used as a basic brick in visual understanding since it provides some flexibility through its adjustable kernels. However, this operator suffers from inherent limitations: (1) the attention kernel is not discriminative enough, resulting in high redundancy, and (2) the complexity in computation and memory is quadratic in the sequence length. In this paper, we propose a novel attention operator, called Lightweight Structure-aware Attention (LiSA), which has a better representation power with log-linear complexity. Our operator transforms the attention kernels to be more discriminative by learning structural patterns. These structural patterns are encoded by exploiting a set of relative position embeddings (RPEs) as multiplicative weights, thereby improving the representation power of the attention kernels. Additionally, the RPEs are approximated to obtain log-linear complexity. Our experiments and analyses demonstrate that the proposed operator outperforms self-attention and other existing operators, achieving state-of-the-art results on ImageNet-1K and other downstream tasks such as video action recognition on Kinetics-400, object detection \& instance segmentation on COCO, and semantic segmentation on ADE-20K.

下载PDF全文

下载文献需遵守相关版权规定

论文标题