论文标题

具有步态识别的变压器的多尺度上下文感知网络

Multi-scale Context-aware Network with Transformer for Gait Recognition

论文作者

Zhu, Duowang, Huang, Xiaohu, Wang, Xinggang, Yang, Bo, He, Botao, Liu, Wenyu, Feng, Bin

论文摘要

尽管步态识别最近引起了越来越多的研究注意力,但由于轮廓差异在空间域中非常微妙,因此时间特征表示对于步态识别至关重要。受到观察的启发,即人类可以通过适应性地专注于不同时间尺度的剪辑来区分不同受试者的步态,因此我们提出了一个多尺度的上下文感知网络,具有变压器(MCAT)以识别步态识别。 MCAT在三个量表上生成时间功能,并使用从本地和全球角度的上下文信息自适应地汇总它们。具体而言,MCAT包含一个自适应时间聚集(ATA)模块,该模块执行局部关系建模,然后进行全局关系建模以融合多尺度特征。此外,为了纠正由时间操作导致的空间特征损坏,MCAT将显着的空间特征学习(SSFL)模块融合在一起,以选择判别空间特征组。在三个数据集上进行的广泛实验证明了最先进的性能。具体而言,我们在正常步行,袋装和戴上涂层条件下达到98.7%,96.2%和88.7%的级别1精度,OU-MVLP的97.5%,增长为50.6%。源代码将在https://github.com/zhuduowang/mcat.git上找到。

Although gait recognition has drawn increasing research attention recently, since the silhouette differences are quite subtle in spatial domain, temporal feature representation is crucial for gait recognition. Inspired by the observation that humans can distinguish gaits of different subjects by adaptively focusing on clips of varying time scales, we propose a multi-scale context-aware network with transformer (MCAT) for gait recognition. MCAT generates temporal features across three scales, and adaptively aggregates them using contextual information from both local and global perspectives. Specifically, MCAT contains an adaptive temporal aggregation (ATA) module that performs local relation modeling followed by global relation modeling to fuse the multi-scale features. Besides, in order to remedy the spatial feature corruption resulting from temporal operations, MCAT incorporates a salient spatial feature learning (SSFL) module to select groups of discriminative spatial features. Extensive experiments conducted on three datasets demonstrate the state-of-the-art performance. Concretely, we achieve rank-1 accuracies of 98.7%, 96.2% and 88.7% under normal walking, bag-carrying and coat-wearing conditions on CASIA-B, 97.5% on OU-MVLP and 50.6% on GREW. The source code will be available at https://github.com/zhuduowang/MCAT.git.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源