E^2tad：基于节能的基于跟踪的动作检测器

论文标题

E^2tad：基于节能的基于跟踪的动作检测器

E^2TAD: An Energy-Efficient Tracking-based Action Detector

论文作者

Hu, Xin, Wu, Zhenyu, Miao, Hao-Yu, Fan, Siqi, Long, Taiyu, Hu, Zhenyu, Pi, Pengcheng, Wu, Yi, Ren, Zhou, Wang, Zhangyang, Hua, Gang

论文摘要

视频动作检测（时空动作定位）通常是当今以人为中心的视频分析的起点。它对机器人技术，安全性，医疗保健等的许多应用都具有很大的实际影响。更快的R-CNN的两阶段范式激发了对象检测中视频动作检测的标准范式，即首先生成人员建议，然后对其行为进行分类。但是，现有的解决方案都无法为“当时的WHO-WHE-WHE-WHEN-WHEN-WHEN-WHEN-WHEN-WHE-WHE-WHE-WHE-WHE-WHENICE”提供精细的动作检测。本文提出了一种基于跟踪的解决方案，可在空间上（通过预测关联的目标ID和位置）和时间（通过预测精确帧指数中的时间）准确有效地定位预定义的关键操作。该解决方案在2021低功率计算机视觉挑战（LPCVC）的UAV-Video轨道中获得了第一名。

Video action detection (spatio-temporal action localization) is usually the starting point for human-centric intelligent analysis of videos nowadays. It has high practical impacts for many applications across robotics, security, healthcare, etc. The two-stage paradigm of Faster R-CNN inspires a standard paradigm of video action detection in object detection, i.e., firstly generating person proposals and then classifying their actions. However, none of the existing solutions could provide fine-grained action detection to the "who-when-where-what" level. This paper presents a tracking-based solution to accurately and efficiently localize predefined key actions spatially (by predicting the associated target IDs and locations) and temporally (by predicting the time in exact frame indices). This solution won first place in the UAV-Video Track of 2021 Low-Power Computer Vision Challenge (LPCVC).

下载PDF全文

下载文献需遵守相关版权规定

论文标题