论文标题

数据饮食上的彩票:通过稀疏可训练网络找到初始化

Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks

论文作者

Paul, Mansheej, Larsen, Brett W., Ganguli, Surya, Frankle, Jonathan, Dziugaite, Gintare Karolina

论文摘要

关于迭代幅度修剪的惊人观察(Imp; Frankle等,2020)是,在仅几百个密集训练$ \ Unicode $ \ unicode {x2014} $的$ \ unicode {x2014} $之后,该方法可以找到一个稀疏的子网络,可以找到可以与相同精度进行培训的网络相同精度。但是,在步骤0(即随机初始化)中,同样的情况不存在。在这项工作中,我们试图了解这种预训练的早期阶段如何通过数据分布的角度和损失景观几何形状来良好的初始化。从经验上讲,我们观察到,持有预训练的迭代次数恒定的数量,对一小部分(随机选择)数据的训练足以获得IMP同样良好的初始化。另外,我们观察到,通过仅在“简单”训练数据上进行预训练,我们可以减少与完整数据集或随机选择子集的培训相比,找到IMP良好初始化所需的步骤数。最后,我们确定了可预测IMP性能的密集网络损失格局的新型特性,特别表明,在密集网络中连接的更多示例与IMP的良好初始化息息相关。这些结果结合在一起,提供了对IMP早期训练所起的作用的新见解。

A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that $\unicode{x2014}$ after just a few hundred steps of dense training $\unicode{x2014}$ the method can find a sparse sub-network that can be trained to the same accuracy as the dense network. However, the same does not hold at step 0, i.e. random initialization. In this work, we seek to understand how this early phase of pre-training leads to a good initialization for IMP both through the lens of the data distribution and the loss landscape geometry. Empirically we observe that, holding the number of pre-training iterations constant, training on a small fraction of (randomly chosen) data suffices to obtain an equally good initialization for IMP. We additionally observe that by pre-training only on "easy" training data, we can decrease the number of steps necessary to find a good initialization for IMP compared to training on the full dataset or a randomly chosen subset. Finally, we identify novel properties of the loss landscape of dense networks that are predictive of IMP performance, showing in particular that more examples being linearly mode connected in the dense network correlates well with good initializations for IMP. Combined, these results provide new insight into the role played by the early phase training in IMP.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源