GradMix：跨域和任务的多源传输

论文标题

GradMix：跨域和任务的多源传输

GradMix: Multi-source Transfer across Domains and Tasks

论文作者

Li, Junnan, Xu, Ziwei, Wong, Yongkang, Zhao, Qi, Kankanhalli, Mohan

论文摘要

由于深度卷积网络可以找到从X到Y的复杂映射的能力，计算机视觉社区正在见证了提出和解决的新任务的前所未有的速度。每个任务的出现通常伴随着大规模注释的数据集的发布，并进行了深入的网络培训。但是，手动标记足够数量的培训数据是昂贵且耗时的。因此，重要的是开发可以利用标记为数据集的现成的算法来学习目标任务的有用知识。虽然先前的工作主要集中在单个来源转移学习，但我们在半监督的环境中研究了跨域和任务（MS-DTT）的多源传输。我们建议GradMix是一种适用于接受基于梯度的学习规则训练的模型的模型无形方法，以通过梯度下降来转移知识，并在训练过程中加权和混合所有来源的梯度。 GradMix遵循一个元学习目标，该目标将层的权重分配给源梯度，因此组合梯度遵循的方向最大程度地减少了目标数据集中一小部分样本的损失。此外，我们建议根据其对目标任务的重要性以及一种伪标记的方法来适应每个微型批次的学习率，以利用目标域中未标记的样本。我们对两个任务进行了MS-DTT实验：数字识别和动作识别，并证明了该方法对多个基线的有利性能。

The computer vision community is witnessing an unprecedented rate of new tasks being proposed and addressed, thanks to the deep convolutional networks' capability to find complex mappings from X to Y. The advent of each task often accompanies the release of a large-scale annotated dataset, for supervised training of deep network. However, it is expensive and time-consuming to manually label sufficient amount of training data. Therefore, it is important to develop algorithms that can leverage off-the-shelf labeled dataset to learn useful knowledge for the target task. While previous works mostly focus on transfer learning from a single source, we study multi-source transfer across domains and tasks (MS-DTT), in a semi-supervised setting. We propose GradMix, a model-agnostic method applicable to any model trained with gradient-based learning rule, to transfer knowledge via gradient descent by weighting and mixing the gradients from all sources during training. GradMix follows a meta-learning objective, which assigns layer-wise weights to the source gradients, such that the combined gradient follows the direction that minimize the loss for a small set of samples from the target dataset. In addition, we propose to adaptively adjust the learning rate for each mini-batch based on its importance to the target task, and a pseudo-labeling method to leverage the unlabeled samples in the target domain. We conduct MS-DTT experiments on two tasks: digit recognition and action recognition, and demonstrate the advantageous performance of the proposed method against multiple baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题