使用奇异价值分解的压缩意识持续学习

论文标题

使用奇异价值分解的压缩意识持续学习

Compression-aware Continual Learning using Singular Value Decomposition

论文作者

Teja, Varigonda Pavan, Panda, Priyadarshini

论文摘要

我们提出了一种基于压缩的持续任务学习方法，该方法可以动态发展神经网络。受到最近模型压缩技术的启发，我们采用了压缩感知训练，并使用单数值分解（SVD）执行低级重量近似来实现网络压实。通过鼓励网络学习低级重量过滤器，我们的方法可以在不需要昂贵的微调的情况下以最小的性能下降来实现压缩表示形式。具体而言，我们使用SVD分解了权重过滤器，并以其分解形式的增量任务训练网络。这样的分解使我们能够直接在单数值上施加稀疏性诱导正规化器，并使我们能够为每个任务使用较少数量的参数。我们进一步介绍了一个新颖的任务之间基于代表空间的学习。这促进了传入的任务，以仅在先前学到的重量过滤器之上学习剩余的特定任务特定信息，并在固定容量限制下有助于学习。我们的方法在三个基准数据集上的先前持续学习方法明显胜过，这表明在20阶Cifar-100，Miniimagenet和5序列数据集上的准确性提高了10.3％，12.3％，15.6％，而不是先进的时间。此外，与基线单个任务模型相比，在上述数据集上，我们的方法分别产生约3.64倍，2.88倍，5.91倍的压缩模型。我们的源代码可在https://github.com/pavanteja295/cacl上找到。

We propose a compression based continual task learning method that can dynamically grow a neural network. Inspired from the recent model compression techniques, we employ compression-aware training and perform low-rank weight approximations using singular value decomposition (SVD) to achieve network compaction. By encouraging the network to learn low-rank weight filters, our method achieves compressed representations with minimal performance degradation without the need for costly fine-tuning. Specifically, we decompose the weight filters using SVD and train the network on incremental tasks in its factorized form. Such a factorization allows us to directly impose sparsity-inducing regularizers over the singular values and allows us to use fewer number of parameters for each task. We further introduce a novel shared representational space based learning between tasks. This promotes the incoming tasks to only learn residual task-specific information on top of the previously learnt weight filters and greatly helps in learning under fixed capacity constraints. Our method significantly outperforms prior continual learning approaches on three benchmark datasets, demonstrating accuracy improvements of 10.3%, 12.3%, 15.6% on 20-split CIFAR-100, miniImageNet and a 5-sequence dataset, respectively, over state-of-the-art. Further, our method yields compressed models that have ~3.64x, 2.88x, 5.91x fewer number of parameters respectively, on the above mentioned datasets in comparison to baseline individual task models. Our source code is available at https://github.com/pavanteja295/CACL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题