对预训练的表示

论文标题

对预训练的表示

Bi-tuning of Pre-trained Representations

论文作者

Zhong, Jincheng, Wang, Ximei, Kou, Zhi, Wang, Jianmin, Long, Mingsheng

论文摘要

在深度学习社区中，首先先于大规模数据集预先培训深度神经网络，然后将预训练的模型调整为特定的下游任务。最近，受监督和无监督的学习表征的预培训方法取得了显着的进步，这些方法分别利用了标签的歧视性知识和数据的内在结构。它遵循自然的直觉，即下游任务的歧视性知识和内在结构都可以对微调有用，但是，现有的微调方法主要利用前者并丢弃后者。出现一个问题：如何充分探索数据的固有结构以促进微调？在本文中，我们提出了BI-Tuning，这是一个通用的学习框架，以对下游任务进行微调和无监督的预训练的预训练。双调节通过将两个头部整合在预训练的表示的主链上：一个具有改进的对比度跨侧面损失的分类器头，以实例对比的方式更好地利用标签信息，并具有针对投影仪的头部，并具有新的分类相对性学习损失，以新的副本损失对interlins interlins interlins interins interins interins interins，则可以更好地利用标签信息。全面的实验证实，双调节实现了最先进的结果，可以通过较大的利润来进行监督和无监督的预培训模型的微调任务（例如，低DATA制度中CUB的10.7 \ 10.7 \％的绝对准确性）。

It is common within the deep learning community to first pre-train a deep neural network from a large-scale dataset and then fine-tune the pre-trained model to a specific downstream task. Recently, both supervised and unsupervised pre-training approaches to learning representations have achieved remarkable advances, which exploit the discriminative knowledge of labels and the intrinsic structure of data, respectively. It follows natural intuition that both discriminative knowledge and intrinsic structure of the downstream task can be useful for fine-tuning, however, existing fine-tuning methods mainly leverage the former and discard the latter. A question arises: How to fully explore the intrinsic structure of data for boosting fine-tuning? In this paper, we propose Bi-tuning, a general learning framework to fine-tuning both supervised and unsupervised pre-trained representations to downstream tasks. Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the backbone of pre-trained representations: a classifier head with an improved contrastive cross-entropy loss to better leverage the label information in an instance-contrast way, and a projector head with a newly-designed categorical contrastive learning loss to fully exploit the intrinsic structure of data in a category-consistent way. Comprehensive experiments confirm that Bi-tuning achieves state-of-the-art results for fine-tuning tasks of both supervised and unsupervised pre-trained models by large margins (e.g. 10.7\% absolute rise in accuracy on CUB in low-data regime).

下载PDF全文

下载文献需遵守相关版权规定

论文标题