论文标题
记住过去:将数据集蒸馏到神经网络的可寻址记忆中
Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks
论文作者
论文摘要
我们提出了一种算法,将大型数据集的关键信息压缩为紧凑的可寻址记忆。然后可以召回这些记忆以快速重新培养神经网络并恢复性能(而不是在完整的原始数据集中存储和重新训练)。在数据集蒸馏框架的基础上,我们做出了一个关键观察,即共享的共同表示可以更有效,有效蒸馏。具体而言,我们学习了一组基础(又名``记忆''),它们在类之间共享,并通过学习的灵活的地址功能结合在一起,以生成各种各样的培训示例。这导致了几个好处:1)压缩数据的大小不一定随类数量线性增长; 2)总体上更高的压缩率可以实现更有效的蒸馏; 3)除了回忆原始类外,还允许更多的广义查询。我们在六个基准的数据集蒸馏任务上展示了最新的结果,包括分别蒸馏CIFAR10和CIFAR100时,保留准确性提高了高达16.5%和9.7%。然后,我们利用我们的框架进行持续学习,在四个基准测试上取得最先进的结果,许多基准的准确性提高了23.2%。该代码在我们的项目网页上发布https://github.com/princetonvisualai/remememberthepast-datasetdistillation。
We propose an algorithm that compresses the critical information of a large dataset into compact addressable memories. These memories can then be recalled to quickly re-train a neural network and recover the performance (instead of storing and re-training on the full original dataset). Building upon the dataset distillation framework, we make a key observation that a shared common representation allows for more efficient and effective distillation. Concretely, we learn a set of bases (aka ``memories'') which are shared between classes and combined through learned flexible addressing functions to generate a diverse set of training examples. This leads to several benefits: 1) the size of compressed data does not necessarily grow linearly with the number of classes; 2) an overall higher compression rate with more effective distillation is achieved; and 3) more generalized queries are allowed beyond recalling the original classes. We demonstrate state-of-the-art results on the dataset distillation task across six benchmarks, including up to 16.5% and 9.7% in retained accuracy improvement when distilling CIFAR10 and CIFAR100 respectively. We then leverage our framework to perform continual learning, achieving state-of-the-art results on four benchmarks, with 23.2% accuracy improvement on MANY. The code is released on our project webpage https://github.com/princetonvisualai/RememberThePast-DatasetDistillation.