对语言模型的加权低排名估计的数值优化

论文标题

对语言模型的加权低排名估计的数值优化

Numerical Optimizations for Weighted Low-rank Estimation on Language Model

论文作者

Hua, Ting, Hsu, Yen-Chang, Wang, Felicity, Lou, Qian, Shen, Yilin, Jin, Hongxia

论文摘要

奇异值分解（SVD）是近似具有较小矩阵的目标矩阵的最流行的压缩方法之一。但是，标准SVD将矩阵中的参数视为同等的，这是一个简单但不切实际的假设。训练有素的神经网络模型的参数可能会不均匀地影响任务绩效，这表明参数之间的重要性是非平等的。与SVD相比，意识到参数重要性的分解方法在实际情况下是更实用的选择。与标准SVD不同，加权值分解是缺乏封闭形式解决方案的非凸优化问题。我们系统地研究了多种优化策略，以解决该问题，并通过压缩基于变压器的语言模型来检查我们的方法。此外，我们设计了一个指标，以预测SVD何时可能引入大量性能下降，我们的方法可以成为一种救援策略。广泛的评估表明，在压缩基于变压器的语言模型时，我们的方法可以比当前的SOTA方法更好。

Singular value decomposition (SVD) is one of the most popular compression methods that approximate a target matrix with smaller matrices. However, standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption. The parameters of a trained neural network model may affect task performance unevenly, which suggests non-equal importance among the parameters. Compared to SVD, the decomposition method aware of parameter importance is the more practical choice in real cases. Unlike standard SVD, weighted value decomposition is a non-convex optimization problem that lacks a closed-form solution. We systematically investigated multiple optimization strategies to tackle the problem and examined our method by compressing Transformer-based language models. Further, we designed a metric to predict when the SVD may introduce a significant performance drop, for which our method can be a rescue strategy. The extensive evaluations demonstrate that our method can perform better than current SOTA methods in compressing Transformer-based language models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题