内核山脊回归的数据集元学习

论文标题

内核山脊回归的数据集元学习

Dataset Meta-Learning from Kernel Ridge-Regression

论文作者

Nguyen, Timothy, Chen, Zhourong, Lee, Jaehoon

论文摘要

任何机器学习算法的最基本方面之一是该算法使用的培训数据。我们介绍了$ε$ - 数据集的新颖概念，该数据集的数据集比原始培训数据较小或重大损坏的数据集，同时保持相似的模型性能。我们引入了一种称为核诱导点（KIP）的元学习算法，用于获得如此出色的数据集，灵感来自于无限范围内的神经网络与内核山脊回归（KRR）之间对应关系的最新发展。对于KRR任务，我们证明KIP可以通过一个或两个数量级来压缩数据集，从而显着改善了先前的数据集蒸馏和子集选择方法，同时获得MNIST和CIFAR-10分类的最新结果。此外，我们的KIP学习数据集可以转移到有限宽度神经网络的培训，甚至超出了懒惰训练制度，这导致了神经网络数据集蒸馏的最先进的结果，并具有潜在的隐私保护应用程序。

One of the most fundamental aspects of any machine learning algorithm is the training data used by the algorithm. We introduce the novel concept of $ε$-approximation of datasets, obtaining datasets which are much smaller than or are significant corruptions of the original training data while maintaining similar model performance. We introduce a meta-learning algorithm called Kernel Inducing Points (KIP) for obtaining such remarkable datasets, inspired by the recent developments in the correspondence between infinitely-wide neural networks and kernel ridge-regression (KRR). For KRR tasks, we demonstrate that KIP can compress datasets by one or two orders of magnitude, significantly improving previous dataset distillation and subset selection methods while obtaining state of the art results for MNIST and CIFAR-10 classification. Furthermore, our KIP-learned datasets are transferable to the training of finite-width neural networks even beyond the lazy-training regime, which leads to state of the art results for neural network dataset distillation with potential applications to privacy-preservation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题