论文标题
BAMSSPROD:迈向将自适应优化方法推广到深二进制模型的一步
BAMSProd: A Step towards Generalizing the Adaptive Optimization Methods to Deep Binary Model
论文作者
论文摘要
最近的方法已大大降低了二元神经网络(BNN)的性能降解,但是保证对BNN的有效和有效培训是一个未解决的问题。主要原因是直接估计器(Ste)与实际衍生物的梯度不匹配产生的估计梯度。在本文中,我们提供了一个明确的凸优化示例,其中使用传统自适应优化方法训练BNN仍然面临着非连接的风险,并确定限制梯度范围对于优化深层二进制模型至关重要,以避免高度次优的解决方案。为了解决上述问题,我们提出了一种BAMSPROD算法,其关键观察结果是,优化深二进制模型的收敛属性与量化误差密切相关。简而言之,它通过错误测量来采用自适应范围约束,以平滑梯度过渡,同时遵循从AMSGRAD的指数移动策略,以避免在优化期间误差积累。该实验验证了理论收敛分析的推论,并进一步证明,我们的优化方法可以加快收敛速度约1:2倍,并将BNN的性能提高到显着的水平,即使在高度非convex优化问题中,也要大约3:7%。
Recent methods have significantly reduced the performance degradation of Binary Neural Networks (BNNs), but guaranteeing the effective and efficient training of BNNs is an unsolved problem. The main reason is that the estimated gradients produced by the Straight-Through-Estimator (STE) mismatches with the gradients of the real derivatives. In this paper, we provide an explicit convex optimization example where training the BNNs with the traditionally adaptive optimization methods still faces the risk of non-convergence, and identify that constraining the range of gradients is critical for optimizing the deep binary model to avoid highly suboptimal solutions. For solving above issues, we propose a BAMSProd algorithm with a key observation that the convergence property of optimizing deep binary model is strongly related to the quantization errors. In brief, it employs an adaptive range constraint via an errors measurement for smoothing the gradients transition while follows the exponential moving strategy from AMSGrad to avoid errors accumulation during the optimization. The experiments verify the corollary of theoretical convergence analysis, and further demonstrate that our optimization method can speed up the convergence about 1:2x and boost the performance of BNNs to a significant level than the specific binary optimizer about 3:7%, even in a highly non-convex optimization problem.