COSCL：小型持续学习者的合作比大型学习更强大

论文标题

COSCL：小型持续学习者的合作比大型学习更强大

CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One

论文作者

Wang, Liyuan, Zhang, Xingxing, Li, Qian, Zhu, Jun, Zhong, Yi

论文摘要

持续学习需要与一系列任务的逐步兼容性。但是，模型体系结构的设计仍然是一个悬而未决的问题：一般而言，通过一组共享参数的所有任务学习所有任务都受到任务之间严重干扰的损害；使用专用参数子空间学习每个任务时，受到可扩展性的限制。在这项工作中，我们从理论上分析了在不断学习中学习可塑性和记忆稳定性的概括错误，这可以在任务分布之间的差异（1）损失景观和（3）参数空间的覆盖率之间均匀地限制。然后，受到强大的生物学学习系统的启发，该系统通过多个平行的隔室处理顺序体验，我们建议将小型持续学习者（COSCL）的合作作为持续学习的一般策略。具体而言，我们介绍了一个架构，该体系结构具有固定数量的较窄的子网络，以并联学习所有增量任务，这可以自然地通过改善上限的三个组件来减少两个错误。为了增强这一优势，我们鼓励通过惩罚其功能表示的预测差异来合作这些子网络。借助固定的参数预算，COSCL可以大幅度提高各种代表性的持续学习方法（例如，CIFAR-100-SC的最高可高达10.64％，CIFAR-100-RS的9.33％，Cub-20011-200-2011的11.45％，而在微小的Imimagenet上为6.72％），并获得新的状态绩效。

Continual learning requires incremental compatibility with a sequence of tasks. However, the design of model architecture remains an open question: In general, learning all tasks with a shared set of parameters suffers from severe interference between tasks; while learning each task with a dedicated parameter subspace is limited by scalability. In this work, we theoretically analyze the generalization errors for learning plasticity and memory stability in continual learning, which can be uniformly upper-bounded by (1) discrepancy between task distributions, (2) flatness of loss landscape and (3) cover of parameter space. Then, inspired by the robust biological learning system that processes sequential experiences with multiple parallel compartments, we propose Cooperation of Small Continual Learners (CoSCL) as a general strategy for continual learning. Specifically, we present an architecture with a fixed number of narrower sub-networks to learn all incremental tasks in parallel, which can naturally reduce the two errors through improving the three components of the upper bound. To strengthen this advantage, we encourage to cooperate these sub-networks by penalizing the difference of predictions made by their feature representations. With a fixed parameter budget, CoSCL can improve a variety of representative continual learning approaches by a large margin (e.g., up to 10.64% on CIFAR-100-SC, 9.33% on CIFAR-100-RS, 11.45% on CUB-200-2011 and 6.72% on Tiny-ImageNet) and achieve the new state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题