论文标题

具有深层聚合网络的行动识别

Action Recognition with Deep Multiple Aggregation Networks

论文作者

Mazari, Ahmed, Sahbi, Hichem

论文摘要

当前的大多数动作识别算法都是基于堆叠多个卷积,合并和完全连接的层的深网。尽管在文献中对卷积和完全连接的操作进行了广泛的研究,但处理动作识别的合并操作的设计,具有不同的时间粒度来源,在行动类别中的不同来源,较少受到关注,并且现有解决方案主要依赖于最大或平均操作。后者显然是无能为力的,无法完全表现出动作类别的实际时间粒度,从而构成了分类表演的瓶颈。在本文中,我们介绍了一种新型的分层池设计,该设计在动作识别中捕获了不同水平的时间粒度。我们的设计原理是粗到精细的,并使用树结构网络实现了。当我们自上而下时,当我们穿越该网络时,汇总操作的不变性越来越少,但及时坚决且本地化。通过解决一个约束的最小化问题,可以获得该网络中最适合给定基础的操作的组合 - 最适合给定的地面真实情况,该问题的解决方案对应于捕获全球层次结构池过程中每个级别(及其时间粒度)贡献的权重分布。除了被原则性和扎根良好之外,提议的分层汇集也是视频长度和分辨率的不可知论。对挑战性的UCF-101,HMDB-51和JHMDB-21数据库进行的广泛实验证实了所有这些陈述。

Most of the current action recognition algorithms are based on deep networks which stack multiple convolutional, pooling and fully connected layers. While convolutional and fully connected operations have been widely studied in the literature, the design of pooling operations that handle action recognition, with different sources of temporal granularity in action categories, has comparatively received less attention, and existing solutions rely mainly on max or averaging operations. The latter are clearly powerless to fully exhibit the actual temporal granularity of action categories and thereby constitute a bottleneck in classification performances. In this paper, we introduce a novel hierarchical pooling design that captures different levels of temporal granularity in action recognition. Our design principle is coarse-to-fine and achieved using a tree-structured network; as we traverse this network top-down, pooling operations are getting less invariant but timely more resolute and well localized. Learning the combination of operations in this network -- which best fits a given ground-truth -- is obtained by solving a constrained minimization problem whose solution corresponds to the distribution of weights that capture the contribution of each level (and thereby temporal granularity) in the global hierarchical pooling process. Besides being principled and well grounded, the proposed hierarchical pooling is also video-length and resolution agnostic. Extensive experiments conducted on the challenging UCF-101, HMDB-51 and JHMDB-21 databases corroborate all these statements.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源