基于骨架的动作识别的HyperGraph Transformer

论文标题

基于骨架的动作识别的HyperGraph Transformer

Hypergraph Transformer for Skeleton-based Action Recognition

论文作者

Zhou, Yuxuan, Cheng, Zhi-Qi, Li, Chao, Fang, Yanwen, Geng, Yifeng, Xie, Xuansong, Keuper, Margret

论文摘要

基于骨架的作用识别旨在鉴于与骨骼互连的人类联合坐标的人类行为。通过定义带有关节为顶点及其自然连接的图形作为边缘，先前的作品成功地采用了图形卷积网络（GCN）来对关节共发生进行建模并实现了卓越的性能。最近，确定了GCN的局限性，即训练后固定拓扑。为了放松这种限制，已经采用了自我注意力（SA）机制来使GCN的拓扑适应输入，从而导致最新的混合模型。同时，也已经进行了普通变压器的尝试，但由于缺乏结构性先验，它们仍然落后于最先进的基于GCN的方法。与混合模型不同，我们提出了一种更优雅的解决方案，可以通过图形距离嵌入将骨连接性结合到变压器中。我们的嵌入在训练过程中保留了骨骼结构的信息，而GCN仅将其用于初始化。更重要的是，我们揭示了图形模型的潜在问题，即成对聚集基本上忽略了身体关节之间的高阶运动学依赖性。为了填补这一空白，我们提出了一种新的自我注意力（SA）机制，称为HyperGraph自我注意力（Hypersa），以将内在的高阶关系纳入模型。我们命名了最终的模型SuplyFormer，它击败了最先进的图形模型W.R.T. NTU RGB+D，NTU RGB+D 120和Northwestern-UCLA数据集的准确性和效率。

Skeleton-based action recognition aims to recognize human actions given human joint coordinates with skeletal interconnections. By defining a graph with joints as vertices and their natural connections as edges, previous works successfully adopted Graph Convolutional networks (GCNs) to model joint co-occurrences and achieved superior performance. More recently, a limitation of GCNs is identified, i.e., the topology is fixed after training. To relax such a restriction, Self-Attention (SA) mechanism has been adopted to make the topology of GCNs adaptive to the input, resulting in the state-of-the-art hybrid models. Concurrently, attempts with plain Transformers have also been made, but they still lag behind state-of-the-art GCN-based methods due to the lack of structural prior. Unlike hybrid models, we propose a more elegant solution to incorporate the bone connectivity into Transformer via a graph distance embedding. Our embedding retains the information of skeletal structure during training, whereas GCNs merely use it for initialization. More importantly, we reveal an underlying issue of graph models in general, i.e., pairwise aggregation essentially ignores the high-order kinematic dependencies between body joints. To fill this gap, we propose a new self-attention (SA) mechanism on hypergraph, termed Hypergraph Self-Attention (HyperSA), to incorporate intrinsic higher-order relations into the model. We name the resulting model Hyperformer, and it beats state-of-the-art graph models w.r.t. accuracy and efficiency on NTU RGB+D, NTU RGB+D 120, and Northwestern-UCLA datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题