有条件的渠道门控网络，用于任务意识持续学习

论文标题

有条件的渠道门控网络，用于任务意识持续学习

Conditional Channel Gated Networks for Task-Aware Continual Learning

论文作者

Abati, Davide, Tomczak, Jakub, Blankevoort, Tijmen, Calderara, Simone, Cucchiara, Rita, Bejnordi, Babak Ehteshami

论文摘要

卷积神经网络在对一系列学习问题进行优化时会经历灾难性的遗忘：当它们符合当前培训示例的目标时，其先前任务的表现会急剧下降。在这项工作中，我们介绍了一个新颖的框架，以通过有条件的计算解决这个问题。我们为每个卷积层配备了特定于任务的门控模块，选择要应用于给定输入的哪些过滤器。这样，我们获得了两个吸引人的特性。首先，门的执行模式允许识别和保护重要的过滤器，从而确保模型的性能在先前学习的任务中不会损失。其次，通过使用稀疏目标，我们可以促进一组有限的内核的选择，从而保留足够的模型能力来消化新任务。在测试时，需要对每个示例所属的任务的认识。但是，在许多实际情况下，这种知识可能无法使用。因此，我们还介绍了一个任务分类器，该任务分类器可以预测每个示例的任务标签，以处理无法使用任务的设置。我们验证有关四个持续学习数据集的建议。结果表明，我们的模型在存在和不存在任务的情况下始终优于现有方法。值得注意的是，在Split SVHN和Imagenet-50数据集上，我们的模型可在准确性W.R.T.中提高23.98％和17.42％。竞争方法。

Convolutional Neural Networks experience catastrophic forgetting when optimized on a sequence of learning problems: as they meet the objective of the current training examples, their performance on previous tasks drops drastically. In this work, we introduce a novel framework to tackle this problem with conditional computation. We equip each convolutional layer with task-specific gating modules, selecting which filters to apply on the given input. This way, we achieve two appealing properties. Firstly, the execution patterns of the gates allow to identify and protect important filters, ensuring no loss in the performance of the model for previously learned tasks. Secondly, by using a sparsity objective, we can promote the selection of a limited set of kernels, allowing to retain sufficient model capacity to digest new tasks.Existing solutions require, at test time, awareness of the task to which each example belongs to. This knowledge, however, may not be available in many practical scenarios. Therefore, we additionally introduce a task classifier that predicts the task label of each example, to deal with settings in which a task oracle is not available. We validate our proposal on four continual learning datasets. Results show that our model consistently outperforms existing methods both in the presence and the absence of a task oracle. Notably, on Split SVHN and Imagenet-50 datasets, our model yields up to 23.98% and 17.42% improvement in accuracy w.r.t. competing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题