FPT：通过渐进培训提高及时调整效率

论文标题

FPT：通过渐进培训提高及时调整效率

FPT: Improving Prompt Tuning Efficiency via Progressive Training

论文作者

Huang, Yufei, Qin, Yujia, Wang, Huadong, Yin, Yichun, Sun, Maosong, Liu, Zhiyuan, Liu, Qun

论文摘要

最近，作为调整预训练的语言模型（PLM）的一种参数有效的方式，及时的调整（PT）已越来越多。尽管大量减少了可调参数的数量并达到令人满意的性能，但由于其缓慢的收敛性，PT还是训练的。为了提高PT的训练效率，我们首先对“部分PLM”的迅速传递性进行了一些新颖的观察，这些观察是通过在深度或宽度中压缩PLM来定义的。我们观察到，各种大小的不同部分PLM所学的软提示在参数空间中相似，这意味着这些软提示可能会在部分PLM中传递。受这些观察的启发，我们提出了快速提示调整（FPT），该调整是从使用小规模的部分PLM进行PT开始的，然后逐渐扩展其深度和宽度直至全模型尺寸。每次扩展之后，我们将先前学到的软提示回收为放大的部分PLM的初始化，然后继续进行PT。我们证明了FPT对5个任务的可行性，并表明FPT可以节省30％以上的训练计算，同时实现可比的性能。

Recently, prompt tuning (PT) has gained increasing attention as a parameter-efficient way of tuning pre-trained language models (PLMs). Despite extensively reducing the number of tunable parameters and achieving satisfying performance, PT is training-inefficient due to its slow convergence. To improve PT's training efficiency, we first make some novel observations about the prompt transferability of "partial PLMs", which are defined by compressing a PLM in depth or width. We observe that the soft prompts learned by different partial PLMs of various sizes are similar in the parameter space, implying that these soft prompts could potentially be transferred among partial PLMs. Inspired by these observations, we propose Fast Prompt Tuning (FPT), which starts by conducting PT using a small-scale partial PLM, and then progressively expands its depth and width until the full-model size. After each expansion, we recycle the previously learned soft prompts as initialization for the enlarged partial PLM and then proceed PT. We demonstrate the feasibility of FPT on 5 tasks and show that FPT could save over 30% training computations while achieving comparable performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题