RISP：呈现不变的状态预测器，具有可区分的仿真和呈现跨域参数估计的渲染

论文标题

RISP：呈现不变的状态预测器，具有可区分的仿真和呈现跨域参数估计的渲染

RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation

论文作者

Ma, Pingchuan, Du, Tao, Tenenbaum, Joshua B., Matusik, Wojciech, Gan, Chuang

论文摘要

这项工作考虑了直接从渲染配置无法访问的视频中识别表征物理系统动态运动的参数。现有的解决方案需要大量的培训数据或缺乏对未知渲染配置的普遍性。我们提出了一种新的方法，该方法与域随机化和可区分渲染梯度结合以解决此问题。我们的核心思想是训练渲染不变的状态预测（RISP）网络，该网络将图像差异转换为与渲染配置无关的状态差异，例如照明，阴影或物质反射率。为了训练此预测指标，我们使用可区分渲染的梯度对渲染差异制定了新的损失。此外，我们提出了一种有效的二阶方法，可以计算这种损失的梯度，从而使其无缝地集成到现代的深度学习框架中。我们使用四个任务：状态估计，系统识别，模仿学习和视觉运动控制，在刚体和可变形的仿真环境中评估我们的方法。我们在现实世界中进一步证明了方法的疗效：从其运动序列的视频中推断四四光的状态和动作序列。与现有方法相比，我们的方法的重建错误大大降低，并且在未知渲染配置中具有更好的概括性。

This work considers identifying parameters characterizing a physical system's dynamic motion directly from a video whose rendering configurations are inaccessible. Existing solutions require massive training data or lack generalizability to unknown rendering configurations. We propose a novel approach that marries domain randomization and differentiable rendering gradients to address this problem. Our core idea is to train a rendering-invariant state-prediction (RISP) network that transforms image differences into state differences independent of rendering configurations, e.g., lighting, shadows, or material reflectance. To train this predictor, we formulate a new loss on rendering variances using gradients from differentiable rendering. Moreover, we present an efficient, second-order method to compute the gradients of this loss, allowing it to be integrated seamlessly into modern deep learning frameworks. We evaluate our method in rigid-body and deformable-body simulation environments using four tasks: state estimation, system identification, imitation learning, and visuomotor control. We further demonstrate the efficacy of our approach on a real-world example: inferring the state and action sequences of a quadrotor from a video of its motion sequences. Compared with existing methods, our approach achieves significantly lower reconstruction errors and has better generalizability among unknown rendering configurations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题