论文标题
完美网络:姿势授权的RGB-Flow网络
PERF-Net: Pose Empowered RGB-Flow Net
论文作者
论文摘要
近年来,视频动作识别文献中的许多作品都表明,两个流模型(结合了空间和时间输入流)对于实现最先进的性能是必要的。在本文中,我们显示了根据从每个帧估计的人类姿势添加的另一个流的好处 - 特别是通过在输入RGB框架上呈现姿势。乍一看,鉴于人姿势完全由RGB像素值确定 - 但是,我们(也许令人惊讶的是)这种简单而灵活的添加可以提供互补的增长,因此这一附加流似乎似乎是多余的。 Using this insight, we then propose a new model, which we dub PERF-Net (short for Pose Empowered RGB-Flow Net), which combines this new pose stream with the standard RGB and flow based input streams via distillation techniques and show that our model outperforms the state-of-the-art by a large margin in a number of human action recognition datasets while not requiring flow or pose to be explicitly computed at inference time.拟议的姿势流也是2020年活动网络动力学挑战的获胜者解决方案的一部分。
In recent years, many works in the video action recognition literature have shown that two stream models (combining spatial and temporal input streams) are necessary for achieving state of the art performance. In this paper we show the benefits of including yet another stream based on human pose estimated from each frame -- specifically by rendering pose on input RGB frames. At first blush, this additional stream may seem redundant given that human pose is fully determined by RGB pixel values -- however we show (perhaps surprisingly) that this simple and flexible addition can provide complementary gains. Using this insight, we then propose a new model, which we dub PERF-Net (short for Pose Empowered RGB-Flow Net), which combines this new pose stream with the standard RGB and flow based input streams via distillation techniques and show that our model outperforms the state-of-the-art by a large margin in a number of human action recognition datasets while not requiring flow or pose to be explicitly computed at inference time. The proposed pose stream is also part of the winner solution of the ActivityNet Kinetics Challenge 2020.