论文标题
迈向高性能人类键盘检测
Towards High Performance Human Keypoint Detection
论文作者
论文摘要
由于遮挡,模糊,照明和比例差异,从单个图像中检测到人类关键点的检测非常具有挑战性。在本文中,我们通过设计有效的网络结构,提出了三种有效的培训策略并利用四种有用的后处理技术来解决这一问题。首先,我们发现上下文信息在推理人体配置和隐形关键方面起着重要作用。受此启发,我们提出了一个级联的上下文混音器(CCM),该混音器有效地整合了空间和渠道上下文信息并逐步完善它们。然后,为了最大程度地提高CCM的表示能力,我们通过利用丰富的未标记数据来制定硬性人检测挖掘策略和联合培训策略。它使CCM能够从大量的姿势中学习判别特征。第三,我们提出了几种子像素细化技术,用于后处理关键点预测以提高检测准确性。在MS可可键检测基准上进行的广泛实验证明了该方法比代表性的最先进方法(SOTA)方法的优越性。我们的单个模型与2018年可可键检测挑战的获胜者达到了可比的性能。最终的合奏模型在此基准测试中设置了新的SOTA。
Human keypoint detection from a single image is very challenging due to occlusion, blur, illumination and scale variance. In this paper, we address this problem from three aspects by devising an efficient network structure, proposing three effective training strategies, and exploiting four useful postprocessing techniques. First, we find that context information plays an important role in reasoning human body configuration and invisible keypoints. Inspired by this, we propose a cascaded context mixer (CCM), which efficiently integrates spatial and channel context information and progressively refines them. Then, to maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy by exploiting abundant unlabeled data. It enables CCM to learn discriminative features from massive diverse poses. Third, we present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy. Extensive experiments on the MS COCO keypoint detection benchmark demonstrate the superiority of the proposed method over representative state-of-the-art (SOTA) methods. Our single model achieves comparable performance with the winner of the 2018 COCO Keypoint Detection Challenge. The final ensemble model sets a new SOTA on this benchmark.