通过共同培训进行连续学习的路由网络

论文标题

通过共同培训进行连续学习的路由网络

Routing Networks with Co-training for Continual Learning

论文作者

Collier, Mark, Kokiopoulou, Efi, Gesmundo, Andrea, Berent, Jesse

论文摘要

持续学习的核心挑战是灾难性的遗忘，即当对神经网络进行一系列任务训练时，他们迅速忘记了以前学习的任务。已经观察到，当任务彼此不同时，灾难性的遗忘最严重。我们建议将稀疏路由网络用于持续学习。对于每个输入，这些网络体系结构通过专家网络激活了不同的路径。已证明路由网络学会将类似的任务路由到重叠的专家和不同任务的集合与专家的脱节集。在持续的学习环境中，这种行为是可取的，因为它可以最大程度地减少不同任务之间的干扰，同时允许相关任务之间进行积极的转移。实际上，我们发现有必要开发一种新的培训方法来布线网络，我们称之为共同培训，该方法避免了新任务时避免初始化的专家。当与一个小的情节内存重放缓冲区结合使用时，稀疏的路由网络与MNIST-Permutations和MNIST-ROTTAITS基准测试基准上的共同培训超过了密集连接的网络。

The core challenge with continual learning is catastrophic forgetting, the phenomenon that when neural networks are trained on a sequence of tasks they rapidly forget previously learned tasks. It has been observed that catastrophic forgetting is most severe when tasks are dissimilar to each other. We propose the use of sparse routing networks for continual learning. For each input, these network architectures activate a different path through a network of experts. Routing networks have been shown to learn to route similar tasks to overlapping sets of experts and dissimilar tasks to disjoint sets of experts. In the continual learning context this behaviour is desirable as it minimizes interference between dissimilar tasks while allowing positive transfer between related tasks. In practice, we find it is necessary to develop a new training method for routing networks, which we call co-training which avoids poorly initialized experts when new tasks are presented. When combined with a small episodic memory replay buffer, sparse routing networks with co-training outperform densely connected networks on the MNIST-Permutations and MNIST-Rotations benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题