姿势：直接人类姿势回归与变压器

论文标题

姿势：直接人类姿势回归与变压器

Poseur: Direct Human Pose Regression with Transformers

论文作者

Mao, Weian, Ge, Yongtao, Shen, Chunhua, Tian, Zhi, Wang, Xinlong, Wang, Zhibin, Hengel, Anton van den

论文摘要

我们提出了一种直接的，基于回归的方法，以从单个图像中估计2D人姿势。我们将问题作为序列预测任务提出，我们使用变压器网络解决。该网络直接学习了从图像到关键点坐标的回归映射，而无需诉诸中间表示（例如热图）。这种方法避免了与基于热图的方法相关的许多复杂性。为了克服以前基于回归的方法的特征错位问题，我们提出了一种注意机制，该机制适应与目标关键最相关的功能，从而大大提高了准确性。重要的是，我们的框架是端到端的可区分，并且自然学会利用关键点之间的依赖关系。两个主要的姿势估计数据集在MS-Coco和MPII上进行的实验表明，我们的方法在基于回归的姿势估计中的最新方法显着改善。更值得注意的是，与最佳的基于热图的姿势估计方法相比，我们的第一种基于回归的方法可以表现出色。

We propose a direct, regression-based approach to 2D human pose estimation from single images. We formulate the problem as a sequence prediction task, which we solve using a Transformer network. This network directly learns a regression mapping from images to the keypoint coordinates, without resorting to intermediate representations such as heatmaps. This approach avoids much of the complexity associated with heatmap-based approaches. To overcome the feature misalignment issues of previous regression-based methods, we propose an attention mechanism that adaptively attends to the features that are most relevant to the target keypoints, considerably improving the accuracy. Importantly, our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints. Experiments on MS-COCO and MPII, two predominant pose-estimation datasets, demonstrate that our method significantly improves upon the state-of-the-art in regression-based pose estimation. More notably, ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题