论文标题
VELO:培训多功能通过扩展来学习的优化器
VeLO: Training Versatile Learned Optimizers by Scaling Up
论文作者
论文摘要
虽然深度学习模型已取代了许多域上的手工设计的功能,但这些模型仍经过手工设计的优化器进行训练。在这项工作中,我们利用了深度学习成功学习多功能优化者的成功缩放方法。我们训练优化器进行深度学习,该优化器本身就是一个小型神经网络,可摄取梯度并输出参数更新。在各种优化任务上,通过大约四千个TPU月的计算进行了元训练,我们的优化器不仅表现出引人注目的性能,而且以有趣且出乎意料的方式进行了优化。它不需要高参数调整,而是自动适应了要优化问题的细节。我们在Velo-code.github.io上开源了我们的学习优化器,元训练代码,相关的火车和测试数据,以及带有基线的广泛优化器基准套件。
While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. Meta-trained with approximately four thousand TPU-months of compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized. We open source our learned optimizer, meta-training code, the associated train and test data, and an extensive optimizer benchmark suite with baselines at velo-code.github.io.