SL-DML：信号水平深度度量学习，用于多模式的一声操作识别

论文标题

SL-DML：信号水平深度度量学习，用于多模式的一声操作识别

SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

论文作者

Memmesheimer, Raphael, Theisen, Nick, Paulus, Dietrich

论文摘要

使用公制学习方法识别单个参考样本的活动是一个有希望的研究领域。大多数少数方法都集中在对象识别或面部识别上。我们提出了一种公制学习方法，将动作识别问题减少到嵌入空间中最近的邻居搜索。我们将信号编码为图像，并使用深残留的CNN提取特征。使用三胞胎损失，我们学习了一个功能嵌入。所得编码器将功能转换为一个嵌入式空间，在该空间中，更紧密的距离编码相似的动作，而较高的距离编码不同的操作。我们的方法基于信号水平公式，并且在各种方式中保持灵活。它进一步超过了大规模NTU RGB+D 120数据集的基线，该数据集的单发操作识别协议的基线为5.6％。只有60％的培训数据，我们的方法仍然优于基线方法3.7％。有了40％的培训数据，我们的方法与第二个随访相当。此外，我们表明我们的方法在惯性，骨骼和融合数据的UTD-MHAD数据集上的实验中很好地概括了，以及用于捕获运动数据的数据集。此外，我们的关节间和传感器间实验表明，在以前看不见的设置上有良好的功能。

Recognizing an activity with a single reference sample using metric learning approaches is a promising research field. The majority of few-shot methods focus on object recognition or face-identification. We propose a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space. We encode signals into images and extract features using a deep residual CNN. Using triplet loss, we learn a feature embedding. The resulting encoder transforms features into an embedding space in which closer distances encode similar actions while higher distances encode different actions. Our approach is based on a signal level formulation and remains flexible across a variety of modalities. It further outperforms the baseline on the large scale NTU RGB+D 120 dataset for the One-Shot action recognition protocol by 5.6%. With just 60% of the training data, our approach still outperforms the baseline approach by 3.7%. With 40% of the training data, our approach performs comparably well to the second follow up. Further, we show that our approach generalizes well in experiments on the UTD-MHAD dataset for inertial, skeleton and fused data and the Simitate dataset for motion capturing data. Furthermore, our inter-joint and inter-sensor experiments suggest good capabilities on previously unseen setups.

下载PDF全文

下载文献需遵守相关版权规定

论文标题