论文标题

势头:将动量整合到经常性神经网络中

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

论文作者

Nguyen, Tan M., Baraniuk, Richard G., Bertozzi, Andrea L., Osher, Stanley J., Wang, Bao

论文摘要

设计深神经网络是一门艺术,通常涉及对候选体系结构进行昂贵的搜索。为了克服复发性神经网(RNN),我们建立了RNN中隐藏状态动力学和梯度下降(GD)之间的联系。然后,我们将动量整合到该框架中,并提出了一个新的RNN家族,称为{\ em Momentumrnns}。从理论上讲,我们证明并在数值上证明了势头减轻培训RNNS中消失的梯度问题。我们研究了动量长期术语记忆(动量),并在各种基准测试中验证其在LSTM对应物上的收敛速度和准确性方面的优势。我们还证明了动量适用于许多类型的复发细胞,包括最新的正交RNN中的细胞。最后,我们表明,可以轻松地将基于先进的基于动量的优化方法(例如ADAM和NESTEROV加速梯度加速梯度)可以轻松地纳入动量框架中,以设计具有更好性能的新的复发单元。该代码可在https://github.com/minhtannguyen/momentumrnn上找到。

Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance. The code is available at https://github.com/minhtannguyen/MomentumRNN.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源