明确的基准：对现实世界图像的持续学习

论文标题

明确的基准：对现实世界图像的持续学习

The CLEAR Benchmark: Continual LEArning on Real-World Imagery

论文作者

Lin, Zhiqiu, Shi, Jia, Pathak, Deepak, Ramanan, Deva

论文摘要

持续学习（CL）被广泛认为是终身AI的关键挑战。但是，现有的基准测试置换式和分裂式，使用人造时间变化，并且不与现实世界一致或不一致。在本文中，我们介绍了Clear，这是第一个连续的图像分类基准数据集，其在现实世界中具有自然的视觉概念的时间演变，它跨越了十年（2004-2014）。我们通过现有的大型图像集（YFCC100M）清楚地清楚地通过一种新颖且可扩展的低成本方法来进行粘性语言数据集策划。我们的管道利用了预处理的视觉语言模型（例如剪辑）来交互构建标记的数据集，这些数据集通过众包进一步验证以删除错误甚至不适当的图像（隐藏在原始YFCC100M中）。在先前的CLENG基准上进行清晰的主要优势是，具有现实世界图像的视觉概念的平滑时间演变，包括每个时间段的高质量标记数据以及丰富的未标记样本，用于连续半措辞学习。我们发现，一个简单的无监督预训练步骤已经可以提高只能利用完全监督数据的最新CL算法。我们的分析还表明，主流CL评估方案训练和测试IID数据人为膨胀CL系统的性能。为了解决这个问题，我们为CL提出了小说的“流”协议，该协议始终在（近）未来测试。有趣的是，流媒体协议（a）可以简化数据集策划，因为当今的测试集可以重新用于明天的火车集，并且（b）可以生产更具概括性的模型，具有更准确的性能估算，因为每个时间段的所有标记数据都用于训练和测试（与经典的IID火车测试量不同）。

Continual learning (CL) is widely regarded as crucial challenge for lifelong AI. However, existing CL benchmarks, e.g. Permuted-MNIST and Split-CIFAR, make use of artificial temporal variation and do not align with or generalize to the real-world. In this paper, we introduce CLEAR, the first continual image classification benchmark dataset with a natural temporal evolution of visual concepts in the real world that spans a decade (2004-2014). We build CLEAR from existing large-scale image collections (YFCC100M) through a novel and scalable low-cost approach to visio-linguistic dataset curation. Our pipeline makes use of pretrained vision-language models (e.g. CLIP) to interactively build labeled datasets, which are further validated with crowd-sourcing to remove errors and even inappropriate images (hidden in original YFCC100M). The major strength of CLEAR over prior CL benchmarks is the smooth temporal evolution of visual concepts with real-world imagery, including both high-quality labeled data along with abundant unlabeled samples per time period for continual semi-supervised learning. We find that a simple unsupervised pre-training step can already boost state-of-the-art CL algorithms that only utilize fully-supervised data. Our analysis also reveals that mainstream CL evaluation protocols that train and test on iid data artificially inflate performance of CL system. To address this, we propose novel "streaming" protocols for CL that always test on the (near) future. Interestingly, streaming protocols (a) can simplify dataset curation since today's testset can be repurposed for tomorrow's trainset and (b) can produce more generalizable models with more accurate estimates of performance since all labeled data from each time-period is used for both training and testing (unlike classic iid train-test splits).

下载PDF全文

下载文献需遵守相关版权规定

论文标题