unipose：统一的人类姿势估计在单个图像和视频中

论文标题

unipose：统一的人类姿势估计在单个图像和视频中

UniPose: Unified Human Pose Estimation in Single Images and Videos

论文作者

Artacho, Bruno, Savakis, Andreas

论文摘要

我们提出了基于我们“瀑布”非常空间池体系结构的Unipose，这是一个统一的人类姿势估计框架，它可以实现对几个姿势估计指标的最新分析。使用标准CNN体系结构的当前姿势估计方法在很大程度上依赖于统计后处理或预定义的锚定姿势以进行联合定位。 Unipose结合了上下文分割和联合定位，以高精度在一个阶段估计人姿势，而无需依赖统计后处理方法。 Unipose中的瀑布模块利用了级联体系结构中进行性过滤的效率，同时保持了与空间金字塔配置相当的多尺度视野。此外，我们的方法扩展到Unipose-LSTM进行多帧处理，并在视频中实现最新的姿势估计结果。我们在多个数据集上的结果表明，具有重新网络骨架和瀑布模块的Unipose是一个可靠，有效的架构，用于姿势估算获得最先进的姿势，从而导致单人姿势检测单个图像和视频。

We propose UniPose, a unified framework for human pose estimation, based on our "Waterfall" Atrous Spatial Pooling architecture, that achieves state-of-art-results on several pose estimation metrics. Current pose estimation methods utilizing standard CNN architectures heavily rely on statistical postprocessing or predefined anchor poses for joint localization. UniPose incorporates contextual segmentation and joint localization to estimate the human pose in a single stage, with high accuracy, without relying on statistical postprocessing methods. The Waterfall module in UniPose leverages the efficiency of progressive filtering in the cascade architecture, while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Additionally, our method is extended to UniPose-LSTM for multi-frame processing and achieves state-of-the-art results for temporal pose estimation in Video. Our results on multiple datasets demonstrate that UniPose, with a ResNet backbone and Waterfall module, is a robust and efficient architecture for pose estimation obtaining state-of-the-art results in single person pose detection for both single images and videos.

下载PDF全文

下载文献需遵守相关版权规定

论文标题