亚当与强盗抽样进行深度学习

论文标题

亚当与强盗抽样进行深度学习

Adam with Bandit Sampling for Deep Learning

论文作者

Liu, Rui, Wu, Tianyi, Mozafari, Barzan

论文摘要

亚当是一种用于培训深度学习模型的广泛使用的优化方法。它计算不同参数的单个自适应学习率。在本文中，我们提出了一个称为Adambs的亚当的概括，该概括使我们还可以根据其在模型收敛中的重要性来适应不同的培训示例。为了实现这一目标，我们在所有示例上维护分布，通过根据此分布进行采样，在每个迭代中选择一个迷你批次，我们使用多臂强盗算法对其进行更新。这样可以确保对模型培训更有益的例子以更高的概率进行采样。从理论上讲，我们表明ADAMBS提高了Adam的收敛速率 - $ O（\ sqrt {\ frac {\ log n} {t}}}）$，而不是$ o（\ sqrt {\ frac {\ frac {n} {n} {t}}}} {t}}）$。各种模型和数据集的实验证明了Adambs在实践中的快速收敛。

Adam is a widely used optimization method for training deep learning models. It computes individual adaptive learning rates for different parameters. In this paper, we propose a generalization of Adam, called Adambs, that allows us to also adapt to different training examples based on their importance in the model's convergence. To achieve this, we maintain a distribution over all examples, selecting a mini-batch in each iteration by sampling according to this distribution, which we update using a multi-armed bandit algorithm. This ensures that examples that are more beneficial to the model training are sampled with higher probabilities. We theoretically show that Adambs improves the convergence rate of Adam---$O(\sqrt{\frac{\log n}{T} })$ instead of $O(\sqrt{\frac{n}{T}})$ in some cases. Experiments on various models and datasets demonstrate Adambs's fast convergence in practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题