论文标题
对语言模型的加权低排名估计的数值优化
Numerical Optimizations for Weighted Low-rank Estimation on Language Model
论文作者
论文摘要
奇异值分解(SVD)是近似具有较小矩阵的目标矩阵的最流行的压缩方法之一。但是,标准SVD将矩阵中的参数视为同等的,这是一个简单但不切实际的假设。训练有素的神经网络模型的参数可能会不均匀地影响任务绩效,这表明参数之间的重要性是非平等的。与SVD相比,意识到参数重要性的分解方法在实际情况下是更实用的选择。与标准SVD不同,加权值分解是缺乏封闭形式解决方案的非凸优化问题。我们系统地研究了多种优化策略,以解决该问题,并通过压缩基于变压器的语言模型来检查我们的方法。此外,我们设计了一个指标,以预测SVD何时可能引入大量性能下降,我们的方法可以成为一种救援策略。广泛的评估表明,在压缩基于变压器的语言模型时,我们的方法可以比当前的SOTA方法更好。
Singular value decomposition (SVD) is one of the most popular compression methods that approximate a target matrix with smaller matrices. However, standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption. The parameters of a trained neural network model may affect task performance unevenly, which suggests non-equal importance among the parameters. Compared to SVD, the decomposition method aware of parameter importance is the more practical choice in real cases. Unlike standard SVD, weighted value decomposition is a non-convex optimization problem that lacks a closed-form solution. We systematically investigated multiple optimization strategies to tackle the problem and examined our method by compressing Transformer-based language models. Further, we designed a metric to predict when the SVD may introduce a significant performance drop, for which our method can be a rescue strategy. The extensive evaluations demonstrate that our method can perform better than current SOTA methods in compressing Transformer-based language models.