研究模型宽度和密度对存在标签噪声的概括的影响

论文标题

研究模型宽度和密度对存在标签噪声的概括的影响

Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise

论文作者

Xue, Yihao, Whitecross, Kyle, Mirzasoleiman, Baharan

论文摘要

增加过度参数化神经网络的大小已成为实现最新性能的关键。这是由双重下降现象捕获的，其中测试损失遵循减少的降低模式（或有时会单调减少）随着模型宽度的增加而捕获。但是，标签噪声对测试损失曲线的影响尚未完全探索。在这项工作中，我们发现了一种有趣的现象，其中标签噪声导致最初观察到的双下降曲线中的\ textit {final aistent}。具体而言，在足够大的噪声与样本大小比率下，在中间宽度上实现了最佳概括。通过理论分析，我们将这种现象归因于标签噪声引起的测试损失方差的形状转变。此外，我们将最终上升现象扩展到模型密度，并提供了第一个理论表征，表明通过随机降低可训练的参数降低密度可改善标签噪声下的概括。我们还彻底检查了正则化和样本量的作用。令人惊讶的是，我们发现较大的$ \ ell_2 $正则化和强大的学习方法，以加剧标签噪声的最终上升。我们通过对经过MNIST训练的RELU网络，在CIFAR-10/100训练的RESENET/VIT的RELU网络以及在Stanford汽车上训练的Incemnets/VIT的广泛实验确认了我们的发现的有效性。

Increasing the size of overparameterized neural networks has been a key in achieving state-of-the-art performance. This is captured by the double descent phenomenon, where the test loss follows a decreasing-increasing-decreasing pattern (or sometimes monotonically decreasing) as model width increases. However, the effect of label noise on the test loss curve has not been fully explored. In this work, we uncover an intriguing phenomenon where label noise leads to a \textit{final ascent} in the originally observed double descent curve. Specifically, under a sufficiently large noise-to-sample-size ratio, optimal generalization is achieved at intermediate widths. Through theoretical analysis, we attribute this phenomenon to the shape transition of test loss variance induced by label noise. Furthermore, we extend the final ascent phenomenon to model density and provide the first theoretical characterization showing that reducing density by randomly dropping trainable parameters improves generalization under label noise. We also thoroughly examine the roles of regularization and sample size. Surprisingly, we find that larger $\ell_2$ regularization and robust learning methods against label noise exacerbate the final ascent. We confirm the validity of our findings through extensive experiments on ReLu networks trained on MNIST, ResNets/ViTs trained on CIFAR-10/100, and InceptionResNet-v2 trained on Stanford Cars with real-world noisy labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题