论文标题
从演示中学习
Contrastive Learning from Demonstrations
论文作者
论文摘要
本文提出了一个从多个观点捕获的未标记的视频演示中学习视觉表示的框架。我们表明,这些表示形式适用于模仿几个机器人任务,包括拾取和地点。我们通过应用对比度学习来增强与任务相关的信息,同时抑制功能嵌入中的无关信息,从而优化了最近提出的自我监督学习算法。我们验证了公开可用的多视图浇注和自定义选择和放置数据集的建议方法,并将其与TCN三重态基线进行比较。我们使用三个指标评估学习的表示形式:观点对齐,舞台分类和强化学习,并且在所有情况下,与最先进的方法相比,结果都会有所改善,并带来了减少训练迭代次数的额外好处。
This paper presents a framework for learning visual representations from unlabeled video demonstrations captured from multiple viewpoints. We show that these representations are applicable for imitating several robotic tasks, including pick and place. We optimize a recently proposed self-supervised learning algorithm by applying contrastive learning to enhance task-relevant information while suppressing irrelevant information in the feature embeddings. We validate the proposed method on the publicly available Multi-View Pouring and a custom Pick and Place data sets and compare it with the TCN triplet baseline. We evaluate the learned representations using three metrics: viewpoint alignment, stage classification and reinforcement learning, and in all cases the results improve when compared to state-of-the-art approaches, with the added benefit of reduced number of training iterations.