论文标题
finegym:层次结构视频数据集,以了解细粒度的动作理解
FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding
论文作者
论文摘要
在公共基准上,当前的行动识别技术取得了巨大的成功。但是,当用于现实世界应用中时,例如体育分析需要将活动解析为阶段并区分微妙的动作的能力,他们的表现远非令人满意。为了将行动识别提高到一个新的水平,我们开发了FineGym,这是一种在体操视频之上建立的新数据集。与现有的动作识别数据集相比,FineGym在丰富,质量和多样性方面具有区别。特别是,它通过三级语义层次结构提供了动作和子操作级别的时间注释。例如,“平衡梁”事件将被注释为一系列从五组中得出的基本子进程:“ leap-s-ump-hop”,“ beam-turns”,“ flaging-salto”,“ flight-handspring”和“ plightspring”和“ plignspring”和“ pligmount”,其中每组中的子效法将进一步以精确定义的类标签给出。这种新的粒度水平提出了行动识别的重大挑战,例如如何将时间结构与连贯的动作解析,以及如何区分微妙的不同动作类别。我们系统地研究了该数据集上的代表性方法,并获得了许多有趣的发现。我们希望该数据集可以将研究推向动作理解。
On public benchmarks, current action recognition techniques have achieved great success. However, when used in real-world applications, e.g. sport analysis, which requires the capability of parsing an activity into phases and differentiating between subtly different actions, their performances remain far from being satisfactory. To take action recognition to a new level, we develop FineGym, a new dataset built on top of gymnastic videos. Compared to existing action recognition datasets, FineGym is distinguished in richness, quality, and diversity. In particular, it provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy. For example, a "balance beam" event will be annotated as a sequence of elementary sub-actions derived from five sets: "leap-jump-hop", "beam-turns", "flight-salto", "flight-handspring", and "dismount", where the sub-action in each set will be further annotated with finely defined class labels. This new level of granularity presents significant challenges for action recognition, e.g. how to parse the temporal structures from a coherent action, and how to distinguish between subtly different action classes. We systematically investigate representative methods on this dataset and obtain a number of interesting findings. We hope this dataset could advance research towards action understanding.