论文标题
优化火车集合的概括:一种基于梯度的新型框架,用于同时训练参数和超参数
Optimizing generalization on the train set: a novel gradient-based framework to train parameters and hyperparameters simultaneously
论文作者
论文摘要
概括是机器学习中的一个核心问题。大多数预测方法需要仔细校准在Hold-Out \ textIt {验证}数据集上进行的超参数以实现概括。本文的主要目的是基于一种新的风险衡量标准提出一种新颖的方法,该方法使我们能够开发新颖的全自动程序进行概括。我们说明了回归问题中这个新框架的相关性。这种新方法的主要优点是:(i)它可以同时训练模型并在所有可用数据的基于梯度的优化器的单一运行中执行正则化,而无需任何以前的高参数调整; (ii)此框架可以通过$引入正则化参数同时解决几个其他目标(相关,稀疏,...)$。值得注意的是,我们的方法将高参数调整以及特征选择(组合离散优化问题)转化为连续优化问题,该问题可通过经典的基于基于梯度的方法解决。 (iii)我们方法的计算复杂性是$ o(npk)$,其中$ n,p,k $分别表示梯度下降算法的观测,特征和迭代次数。与基准方法相比,我们在实验中观察到的方法的运行时间明显较小。我们的过程是在Pytorch中实现的(可以复制代码)。
Generalization is a central problem in Machine Learning. Most prediction methods require careful calibration of hyperparameters carried out on a hold-out \textit{validation} dataset to achieve generalization. The main goal of this paper is to present a novel approach based on a new measure of risk that allows us to develop novel fully automatic procedures for generalization. We illustrate the pertinence of this new framework in the regression problem. The main advantages of this new approach are: (i) it can simultaneously train the model and perform regularization in a single run of a gradient-based optimizer on all available data without any previous hyperparameter tuning; (ii) this framework can tackle several additional objectives simultaneously (correlation, sparsity,...) $via$ the introduction of regularization parameters. Noticeably, our approach transforms hyperparameter tuning as well as feature selection (a combinatorial discrete optimization problem) into a continuous optimization problem that is solvable via classical gradient-based methods ; (iii) the computational complexity of our methods is $O(npK)$ where $n,p,K$ denote respectively the number of observations, features and iterations of the gradient descent algorithm. We observe in our experiments a significantly smaller runtime for our methods as compared to benchmark methods for equivalent prediction score. Our procedures are implemented in PyTorch (code is available for replication).