用于机器学习的张量网络的切线空间梯度优化

论文标题

用于机器学习的张量网络的切线空间梯度优化

Tangent-Space Gradient Optimization of Tensor Network for Machine Learning

论文作者

Sun, Zheng-zhi, Ran, Shi-ju, Su, Gang

论文摘要

深度机器学习模型的基于梯度的优化方法遇到了梯度消失和爆炸问题，尤其是当计算图变得深度时。在这项工作中，我们提出了概率模型的切线空间梯度优化（TSGO），以防止梯度消失或爆炸。核心思想是确保变分参数与梯度之间的正交性。然后，通过将参数向量向梯度方向旋转来实现优化。我们在张量网络（TN）机器学习中解释并证明了TSGO，其中TN将关节概率分布描述为标准化状态$ \ left |希尔伯特空间中的ψ\ right \ rangle $。我们表明，梯度可以在$ \ weft \langleψ\ right的切线空间中受到限制。\ left | ψ\ right \ rangle = 1 $ hyper-sphere。 TSGO的学习率不是控制深度学习中学习率的其他自适应方法，而是由$η= \tanθ$自然决定的。我们的数值结果揭示了与现成的亚当相比，TSGO的融合更好。

The gradient-based optimization method for deep machine learning models suffers from gradient vanishing and exploding problems, particularly when the computational graph becomes deep. In this work, we propose the tangent-space gradient optimization (TSGO) for the probabilistic models to keep the gradients from vanishing or exploding. The central idea is to guarantee the orthogonality between the variational parameters and the gradients. The optimization is then implemented by rotating parameter vector towards the direction of gradient. We explain and testify TSGO in tensor network (TN) machine learning, where the TN describes the joint probability distribution as a normalized state $\left| ψ\right\rangle $ in Hilbert space. We show that the gradient can be restricted in the tangent space of $\left\langle ψ\right.\left| ψ\right\rangle = 1$ hyper-sphere. Instead of additional adaptive methods to control the learning rate in deep learning, the learning rate of TSGO is naturally determined by the angle $θ$ as $η= \tan θ$. Our numerical results reveal better convergence of TSGO in comparison to the off-the-shelf Adam.

下载PDF全文

下载文献需遵守相关版权规定

论文标题