无监督视频表示的不断发展的损失

论文标题

无监督视频表示的不断发展的损失

Evolving Losses for Unsupervised Video Representation Learning

论文作者

Piergiovanni, AJ, Angelova, Anelia, Ryoo, Michael S.

论文摘要

我们提出了一种新的方法，可以从大规模未标记的视频数据中学习视频表示。理想情况下，该表示形式将是通用的，可转让的，可直接用于诸如行动识别和零或几次学习之类的新任务。我们将无监督的表示学习作为多模式，多任务学习问题，在这种问题中，通过蒸馏跨不同方式共享表示形式。此外，我们通过使用进化搜索算法自动找到损失函数的最佳组合来介绍损失函数演变的概念，从而捕获许多（自我监督）任务和模态。第三，我们根据ZIPF定律，使用与大型未标记数据集的分布匹配作为先验约束，提出了一个无监督的表示评估度量。这种无监督的约束不受任何标签的指导，其结果与弱监督，特定于任务的结果相似。所提出的无监督表示学习在单个RGB网络中会导致以前的方法。值得注意的是，除了大型，完全标记的视频数据集外，它也比几种基于标签的方法（例如ImageNet）更有效。

We present a new method to learn video representations from large-scale unlabeled video data. Ideally, this representation will be generic and transferable, directly usable for new tasks such as action recognition and zero or few-shot learning. We formulate unsupervised representation learning as a multi-modal, multi-task learning problem, where the representations are shared across different modalities via distillation. Further, we introduce the concept of loss function evolution by using an evolutionary search algorithm to automatically find optimal combination of loss functions capturing many (self-supervised) tasks and modalities. Thirdly, we propose an unsupervised representation evaluation metric using distribution matching to a large unlabeled dataset as a prior constraint, based on Zipf's law. This unsupervised constraint, which is not guided by any labeling, produces similar results to weakly-supervised, task-specific ones. The proposed unsupervised representation learning results in a single RGB network and outperforms previous methods. Notably, it is also more effective than several label-based methods (e.g., ImageNet), with the exception of large, fully labeled video datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题