人类运动从野外姿势转移

论文标题

人类运动从野外姿势转移

Human Motion Transfer from Poses in the Wild

论文作者

Ren, Jian, Chai, Menglei, Tulyakov, Sergey, Fang, Chen, Shen, Xiaohui, Yang, Jianchao

论文摘要

在本文中，我们解决了人类运动转移的问题，在该问题中，我们将模仿参考视频动作的目标人综合了新颖的运动视频。这是视频对视频翻译任务，其中估计的姿势用于桥接两个域。尽管该主题取得了长足的进步，但先前方法仍存在一些问题。首先，训练和测试姿势序列之间存在域差距 - 模型是根据训练期间未见的姿势进行了测试，例如困难的舞蹈动作。此外，姿势检测错误是不可避免的，使发电机的工作变得更加困难。最后，从稀疏姿势中生成逼真的像素在一个步骤中具有挑战性。为了应对这些挑战，我们介绍了一个新颖的姿势到视频翻译框架，用于生成高质量的视频，这些视频在训练过程中即使是在野外姿势序列上也是一致的。我们提出了一种姿势增强方法，以最大程度地减少训练测试差距，这是一种统一的配对和未配对的学习策略，以改善检测错误的鲁棒性以及两阶段的网络体系结构，以实现出色的纹理质量。为了进一步提高有关该主题的研究，我们构建了两个人类运动数据集。最后，我们通过对不同数据集进行的广泛实验和评估来展示我们的方法优于最先进的研究。

In this paper, we tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video. It is a video-to-video translation task in which the estimated poses are used to bridge two domains. Despite substantial progress on the topic, there exist several problems with the previous methods. First, there is a domain gap between training and testing pose sequences--the model is tested on poses it has not seen during training, such as difficult dancing moves. Furthermore, pose detection errors are inevitable, making the job of the generator harder. Finally, generating realistic pixels from sparse poses is challenging in a single step. To address these challenges, we introduce a novel pose-to-video translation framework for generating high-quality videos that are temporally coherent even for in-the-wild pose sequences unseen during training. We propose a pose augmentation method to minimize the training-test gap, a unified paired and unpaired learning strategy to improve the robustness to detection errors, and two-stage network architecture to achieve superior texture quality. To further boost research on the topic, we build two human motion datasets. Finally, we show the superiority of our approach over the state-of-the-art studies through extensive experiments and evaluations on different datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题