杯子：来自2D姿势的3D网格重建的多人类图网络

论文标题

杯子：来自2D姿势的3D网格重建的多人类图网络

MUG: Multi-human Graph Network for 3D Mesh Reconstruction from 2D Pose

论文作者

Wu, Chenyan, Li, Yandong, Tang, Xianfeng, Wang, James

论文摘要

从单眼图像中重建多人体网格是一个重要但具有挑战性的计算机视觉问题。除了单个身体网格模型外，我们还需要估计受试者之间的相对3D位置以产生连贯的表示。在这项工作中，通过一个名为MUG（多人类图网络）的单个图神经网络，我们仅使用多人类2D姿势作为输入来构建相干的多人网格。与现有的方法相比，采用检测风格的管道（即提取图像特征，然后找到人类实例并从中找到人体网格），并且遇到了实验室收集的训练数据集和野外测试数据集之间的显着域间隙，我们的方法构成了跨数据集体相对一致的尺寸属性的2D姿势。我们的方法如下：首先，为了建模多人类环境，它处理多人类2D姿势并构建一个新颖的异质图，其中来自不同人和一个人内部的节点与一个人内部相连，以捕获人体间的相互作用并绘制身体几何形状并绘制身体的几何形状（即骨骼和网状结构）。其次，它采用双分支图神经网络结构 - 一种用于预测人间深度关系，另一种用于预测与根连接层的网格坐标。最后，通过将两个分支的输出组合在一起，构建了整个多人类3D网格。广泛的实验表明，在标准3D人类基准的标准3D人类基准-Panoptic，Mupots-3D和3DPW上，杯子的表现优于先前的多人类网格估计方法。

Reconstructing multi-human body mesh from a single monocular image is an important but challenging computer vision problem. In addition to the individual body mesh models, we need to estimate relative 3D positions among subjects to generate a coherent representation. In this work, through a single graph neural network, named MUG (Multi-hUman Graph network), we construct coherent multi-human meshes using only multi-human 2D pose as input. Compared with existing methods, which adopt a detection-style pipeline (i.e., extracting image features and then locating human instances and recovering body meshes from that) and suffer from the significant domain gap between lab-collected training datasets and in-the-wild testing datasets, our method benefits from the 2D pose which has a relatively consistent geometric property across datasets. Our method works like the following: First, to model the multi-human environment, it processes multi-human 2D poses and builds a novel heterogeneous graph, where nodes from different people and within one person are connected to capture inter-human interactions and draw the body geometry (i.e., skeleton and mesh structure). Second, it employs a dual-branch graph neural network structure -- one for predicting inter-human depth relation and the other one for predicting root-joint-relative mesh coordinates. Finally, the entire multi-human 3D meshes are constructed by combining the output from both branches. Extensive experiments demonstrate that MUG outperforms previous multi-human mesh estimation methods on standard 3D human benchmarks -- Panoptic, MuPoTS-3D and 3DPW.

下载PDF全文

下载文献需遵守相关版权规定

论文标题