有效的人类视力启发了使用自适应时空抽样的动作识别

论文标题

有效的人类视力启发了使用自适应时空抽样的动作识别

Efficient Human Vision Inspired Action Recognition using Adaptive Spatiotemporal Sampling

论文作者

Mac, Khoi-Nguyen C., Do, Minh N., Vo, Minh P.

论文摘要

在视频中利用时空冗余的自适应抽样对于在有限的计算机和电池资源的可穿戴设备上始终进行动作识别至关重要。常用的固定采样策略不是上下文感知的，并且可能会在视觉内容下进行样本，从而对计算效率和准确性产生不利影响。受到人类视觉感知机制的动脉视觉和集体前处理的概念的启发，我们引入了一种新型的自适应时空抽样方案，以进行有效的动作识别。我们的系统以低分辨率为扫描前扫视全球场景上下文，并决定跳过或要求在显着区域的高分辨率功能进行进一步处理。我们在Epic-Kitchens和UCF-101数据集上验证该系统以进行动作识别，并表明我们所提出的方法可以大大加速推断，而与最先进的基线相比，准确性丧失的准确性丧失。源代码可在https://github.com/knmac/adaptive_spatiotemporal中找到。

Adaptive sampling that exploits the spatiotemporal redundancy in videos is critical for always-on action recognition on wearable devices with limited computing and battery resources. The commonly used fixed sampling strategy is not context-aware and may under-sample the visual content, and thus adversely impacts both computation efficiency and accuracy. Inspired by the concepts of foveal vision and pre-attentive processing from the human visual perception mechanism, we introduce a novel adaptive spatiotemporal sampling scheme for efficient action recognition. Our system pre-scans the global scene context at low-resolution and decides to skip or request high-resolution features at salient regions for further processing. We validate the system on EPIC-KITCHENS and UCF-101 datasets for action recognition, and show that our proposed approach can greatly speed up inference with a tolerable loss of accuracy compared with those from state-of-the-art baselines. Source code is available in https://github.com/knmac/adaptive_spatiotemporal.

下载PDF全文

下载文献需遵守相关版权规定

论文标题