论文标题
通过孩子的眼睛的自我监督学习
Self-supervised learning through the eyes of a child
论文作者
论文摘要
在出生后的几个月内,孩子们对周围世界产生了有意义的期望。可以通过应用于感官数据的通用学习机制来解释这些早期知识的多少,其中多少需要更具实质性的先天归纳偏见?目前,以其全部的一般性来解决这个基本问题是不可行的,但是由于数据收集技术的改进以及深度学习的最新进展,我们可以在更狭窄的领域中取得真正的进步,例如高级视觉类别的发展。在本文中,我们的目标是通过利用现代的自学深度学习方法和最近从三个年幼的角度记录的纵向以自我为中心的视频数据集来取得这种进步(Sullivan等,2020年)。我们的结果表明,使用通用的自我监管的学习目标出现了从发育中现实的自然视频中出现强大的高级视觉表示。
Within months of birth, children develop meaningful expectations about the world around them. How much of this early knowledge can be explained through generic learning mechanisms applied to sensory data, and how much of it requires more substantive innate inductive biases? Addressing this fundamental question in its full generality is currently infeasible, but we can hope to make real progress in more narrowly defined domains, such as the development of high-level visual categories, thanks to improvements in data collecting technology and recent progress in deep learning. In this paper, our goal is precisely to achieve such progress by utilizing modern self-supervised deep learning methods and a recent longitudinal, egocentric video dataset recorded from the perspective of three young children (Sullivan et al., 2020). Our results demonstrate the emergence of powerful, high-level visual representations from developmentally realistic natural videos using generic self-supervised learning objectives.