低数据转移学习的深层合奏

论文标题

低数据转移学习的深层合奏

Deep Ensembles for Low-Data Transfer Learning

论文作者

Mustafa, Basil, Riquelme, Carlos, Puigcerver, Joan, Pinto, André Susano, Keysers, Daniel, Houlsby, Neil

论文摘要

在低数据制度中，很难从头开始训练良好的监督模型。取而代之的是，从业者转向预先培训的模型，利用转移学习。结合是一种构建强大预测模型的经验和理论上具有吸引力的方式，但是训练具有不同随机初始化的多个深网的主要方法碰撞，需要通过预训练的权重转移。在这项工作中，我们研究了从预训练的模型中创建合奏的不同方式。我们表明，预训练本身的性质是多样性的性能来源，并提出了一种实用算法，该算法有效地识别了任何下游数据集的预训练模型的子集。该方法很简单：使用最接近的纽布精度来对预先训练的模型进行排名，并用较小的超级参数扫描微调最佳模型，并贪婪地构建了一个合奏，以最大程度地减少验证跨渗透性。当对19个不同的下游任务（视觉任务适应基准）上的强大基准进行评估时，即使从超过2,000多个预训练的模型中选择，也可以以较低的推理预算来实现最新的推理预算。我们还评估了对成像网变体的合奏，并显示出提高的分布转移的鲁棒性。

In the low-data regime, it is difficult to train good supervised models from scratch. Instead practitioners turn to pre-trained models, leveraging transfer learning. Ensembling is an empirically and theoretically appealing way to construct powerful predictive models, but the predominant approach of training multiple deep networks with different random initialisations collides with the need for transfer via pre-trained weights. In this work, we study different ways of creating ensembles from pre-trained models. We show that the nature of pre-training itself is a performant source of diversity, and propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset. The approach is simple: Use nearest-neighbour accuracy to rank pre-trained models, fine-tune the best ones with a small hyperparameter sweep, and greedily construct an ensemble to minimise validation cross-entropy. When evaluated together with strong baselines on 19 different downstream tasks (the Visual Task Adaptation Benchmark), this achieves state-of-the-art performance at a much lower inference budget, even when selecting from over 2,000 pre-trained models. We also assess our ensembles on ImageNet variants and show improved robustness to distribution shift.

下载PDF全文

下载文献需遵守相关版权规定

论文标题