CTM：行动识别的协作时间建模

论文标题

CTM：行动识别的协作时间建模

CTM: Collaborative Temporal Modeling for Action Recognition

论文作者

Liu, Qian, Wang, Tao, Liu, Jie, Guan, Yang, Bu, Qi, Yang, Longfei

论文摘要

随着数字多媒体的快速发展，视频理解已成为一个重要的领域。为了识别行动，时间维度起着重要作用，这与图像识别完全不同。为了学习视频的强大功能，我们提出了一个协作时间建模（CTM）块（图1），以学习时间信息以进行操作识别。除了无参数的身份快捷方式外，CTM还包括两个协作路径：一个空间感知的时间建模路径，我们建议使用该空间位置（H*w）的无共享参数（H*w）的时间频道卷积模块（TCCM），以构建和空间模型。 CTM块可以无缝地插入许多流行网络，以生成CTM网络，并将学习时间信息学习的能力带到2D CNN骨干网络，仅捕获空间信息。在几个流行的动作识别数据集上进行的实验表明，CTM块对2D CNN基准的性能进行了改进，而我们的方法可以针对最先进的方法实现竞争结果。代码将公开可用。

With the rapid development of digital multimedia, video understanding has become an important field. For action recognition, temporal dimension plays an important role, and this is quite different from image recognition. In order to learn powerful feature of videos, we propose a Collaborative Temporal Modeling (CTM) block (Figure 1) to learn temporal information for action recognition. Besides a parameter-free identity shortcut, as a separate temporal modeling block, CTM includes two collaborative paths: a spatial-aware temporal modeling path, which we propose the Temporal-Channel Convolution Module (TCCM) with unshared parameters for each spatial position (H*W) to build, and a spatial-unaware temporal modeling path. CTM blocks can seamlessly be inserted into many popular networks to generate CTM Networks and bring the capability of learning temporal information to 2D CNN backbone networks, which only capture spatial information. Experiments on several popular action recognition datasets demonstrate that CTM blocks bring the performance improvements on 2D CNN baselines, and our method achieves the competitive results against the state-of-the-art methods. Code will be made publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题