论文标题
RGBD-DOG:从RGBD传感器预测犬姿势
RGBD-Dog: Predicting Canine Pose from RGBD Sensors
论文作者
论文摘要
动物\ reb {3D}自动提取在没有标记的图像中构成了一系列科学领域。迄今为止,大多数工作都基于关节位置的2D标记来预测RGB图像中的动物姿势。但是,由于获得训练数据的困难性质,没有3D动物运动的基础真相数据集可用于定量评估这些方法。此外,缺乏3D动物姿势数据还使得以与流行的身体姿势预测领域相似的方式训练3D姿势预测方法。在我们的工作中,我们专注于从RGBD图像中估算3D犬的问题,记录了多种狗品种,并通过几个Microsoft Kinect V2录制了各种各样的狗品种,同时通过运动捕获系统获得了3D地面真相骨架。我们从此数据生成了合成RGBD图像的数据集。训练了一个堆叠的沙漏网络,可以预测3D关节位置,然后使用先前的形状和姿势来限制该位置。我们在合成和实际RGBD图像上评估了我们的模型,并将我们的结果与先前发布的工作拟合犬模型与图像进行比较。最后,尽管我们的训练集仅由狗数据组成,但视觉检查意味着我们的网络可以为其他四足动物的图像提供良好的预测 - 例如马或猫 - 当它们的姿势与我们训练集中的姿势相似时。
The automatic extraction of animal \reb{3D} pose from images without markers is of interest in a range of scientific fields. Most work to date predicts animal pose from RGB images, based on 2D labelling of joint positions. However, due to the difficult nature of obtaining training data, no ground truth dataset of 3D animal motion is available to quantitatively evaluate these approaches. In addition, a lack of 3D animal pose data also makes it difficult to train 3D pose-prediction methods in a similar manner to the popular field of body-pose prediction. In our work, we focus on the problem of 3D canine pose estimation from RGBD images, recording a diverse range of dog breeds with several Microsoft Kinect v2s, simultaneously obtaining the 3D ground truth skeleton via a motion capture system. We generate a dataset of synthetic RGBD images from this data. A stacked hourglass network is trained to predict 3D joint locations, which is then constrained using prior models of shape and pose. We evaluate our model on both synthetic and real RGBD images and compare our results to previously published work fitting canine models to images. Finally, despite our training set consisting only of dog data, visual inspection implies that our network can produce good predictions for images of other quadrupeds -- e.g. horses or cats -- when their pose is similar to that contained in our training set.