提示蒸馏：通过加强及时提高无数据知识蒸馏

论文标题

提示蒸馏：通过加强及时提高无数据知识蒸馏

Prompting to Distill: Boosting Data-Free Knowledge Distillation via Reinforced Prompt

论文作者

Ma, Xinyin, Wang, Xinchao, Fang, Gongfan, Shen, Yongliang, Lu, Weiming

论文摘要

无数据知识蒸馏（DFKD）通过消除原始培训数据的依赖性进行知识蒸馏，最近在加速预训练的语言模型方面取得了令人印象深刻的结果。 DFKD的核心是通过反转未压缩模型的参数来重建合成数据集。但是，先前的DFKD方法在很大程度上依赖于重建目标数据分布的手工制作的先验，这可能是不可避免的，并且通常无能地捕获固有分布。为了解决这个问题，我们提出了一种基于及时的方法，称为提示，该方法使我们能够利用学习的语言先验，这可以有效地协调合成句子在语义上和语法上是正确的。具体而言，促使DFD利用预先训练的生成模型提供语言先验，并引入了加强的主题求职者来控制数据综合，使生成的样本在主题上是相关且具有语义上合理的，因此对下游任务友好。如我们的实验所示，所提出的方法显着提高了综合质量，并在蒸馏性能方面取得了可观的改善。在某些情况下，提示DFD甚至会与数据驱动的知识蒸馏的结果相提并论，并访问了原始培训数据。

Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data, and has recently achieved impressive results in accelerating pre-trained language models. At the heart of DFKD is to reconstruct a synthetic dataset by inverting the parameters of the uncompressed model. Prior DFKD approaches, however, have largely relied on hand-crafted priors of the target data distribution for the reconstruction, which can be inevitably biased and often incompetent to capture the intrinsic distributions. To address this problem, we propose a prompt-based method, termed as PromptDFD, that allows us to take advantage of learned language priors, which effectively harmonizes the synthetic sentences to be semantically and grammatically correct. Specifically, PromptDFD leverages a pre-trained generative model to provide language priors and introduces a reinforced topic prompter to control data synthesis, making the generated samples thematically relevant and semantically plausible, and thus friendly to downstream tasks. As shown in our experiments, the proposed method substantially improves the synthesis quality and achieves considerable improvements on distillation performance. In some cases, PromptDFD even gives rise to results on par with those from the data-driven knowledge distillation with access to the original training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题