论文标题
基于骨架的动作识别的分离时空注意网络
Decoupled Spatial-Temporal Attention Network for Skeleton-Based Action Recognition
论文作者
论文摘要
动态骨骼数据(表示为人类关节的2D/3D坐标),由于其高级语义信息和环境鲁棒性,已广泛研究了人类动作识别。但是,以前的方法在很大程度上依赖于设计手工制作的遍历规则或图形拓扑来吸引关节之间的依赖性,而这些遍历的性能和概括性限制。在这项工作中,我们为基于骨架的动作识别提供了一种新颖的脱钩时空注意网络(DSTA-NET)。它仅涉及注意力障碍,允许在关节之间建模空间依赖性,而无需知道其位置或相互连接。具体而言,为了满足骨骼数据的具体要求,提出了三种用于构建注意力块的技术,即时空注意力脱钩,编码分解的位置和空间全球正则化。此外,从数据方面,我们引入了一种骨骼数据解耦技术,以强调时空的特定特征和不同的运动尺度,从而对人类行为有更全面的了解。为了测试所提出的方法的有效性,对基于骨骼的骨骼和动作识别的四个挑战性数据集进行了广泛的实验。 DSTA-NET在所有这些方面都实现了最先进的表现。
Dynamic skeletal data, represented as the 2D/3D coordinates of human joints, has been widely studied for human action recognition due to its high-level semantic information and environmental robustness. However, previous methods heavily rely on designing hand-crafted traversal rules or graph topologies to draw dependencies between the joints, which are limited in performance and generalizability. In this work, we present a novel decoupled spatial-temporal attention network(DSTA-Net) for skeleton-based action recognition. It involves solely the attention blocks, allowing for modeling spatial-temporal dependencies between joints without the requirement of knowing their positions or mutual connections. Specifically, to meet the specific requirements of the skeletal data, three techniques are proposed for building attention blocks, namely, spatial-temporal attention decoupling, decoupled position encoding and spatial global regularization. Besides, from the data aspect, we introduce a skeletal data decoupling technique to emphasize the specific characteristics of space/time and different motion scales, resulting in a more comprehensive understanding of the human actions.To test the effectiveness of the proposed method, extensive experiments are conducted on four challenging datasets for skeleton-based gesture and action recognition, namely, SHREC, DHG, NTU-60 and NTU-120, where DSTA-Net achieves state-of-the-art performance on all of them.