论文标题

基于区域的非本地操作用于视频分类

Region-based Non-local Operation for Video Classification

论文作者

Huang, Guoxi, Bors, Adrian G.

论文摘要

卷积神经网络(CNNS)通过深入堆叠卷积操作的小窗口尺寸来模拟远程依赖性,这使优化变得困难。本文将基于区域的非本地(RNL)操作作为一种自我发挥机制的家族,可以直接捕获长期依赖性,而无需使用深层局部操作。给定一个中间特征图,我们的方法通过汇总来自所有位置的相邻区域的信息来重新校准位置的特征。通过将通道注意模块与建议的RNL相结合,我们设计了一个注意链,可以将其集成到现成的CNN中以进行端到端训练。我们在两个视频分类基准上评估了我们的方法。我们方法的实验结果优于其他注意力机制,我们在某种事物的V1数据集上实现了最先进的性能。

Convolutional Neural Networks (CNNs) model long-range dependencies by deeply stacking convolution operations with small window sizes, which makes the optimizations difficult. This paper presents region-based non-local (RNL) operations as a family of self-attention mechanisms, which can directly capture long-range dependencies without using a deep stack of local operations. Given an intermediate feature map, our method recalibrates the feature at a position by aggregating the information from the neighboring regions of all positions. By combining a channel attention module with the proposed RNL, we design an attention chain, which can be integrated into the off-the-shelf CNNs for end-to-end training. We evaluate our method on two video classification benchmarks. The experimental results of our method outperform other attention mechanisms, and we achieve state-of-the-art performance on the Something-Something V1 dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源