论文标题
对齐ligits一般为有原则的黑盒知识蒸馏
Aligning Logits Generatively for Principled Black-Box Knowledge Distillation
论文作者
论文摘要
黑盒知识蒸馏(B2KD)是云到边缘模型压缩的一个公式问题,该模型压缩具有无形的数据和服务器上托管的模型。 B2KD面临诸如有限的Internet交换和数据分布的边缘云差异之类的挑战。在本文中,我们对由剥夺和蒸馏组成的两步工作流进行形式化,从理论上讲,从逻辑到单元边界提供了一个新的优化方向,与直接逻辑对齐不同。在其指导下,我们提出了一种新的方法映射 - 含有KD(MEKD),该kd(MEKD)将黑盒笨拙的模型提炼成轻量级的模型。我们的方法没有区分处理软反应或硬性反应,包括:1)剥夺:通过生成器模拟教师功能的反映射,以及2)蒸馏:通过减少高维图像点的距离来对齐教师和学生模型的低维逻辑。对于不同的教师成对,我们的方法在各种基准上产生了鼓舞人心的蒸馏性能,并且表现优于以前的最先进方法。
Black-Box Knowledge Distillation (B2KD) is a formulated problem for cloud-to-edge model compression with invisible data and models hosted on the server. B2KD faces challenges such as limited Internet exchange and edge-cloud disparity of data distributions. In this paper, we formalize a two-step workflow consisting of deprivatization and distillation, and theoretically provide a new optimization direction from logits to cell boundary different from direct logits alignment. With its guidance, we propose a new method Mapping-Emulation KD (MEKD) that distills a black-box cumbersome model into a lightweight one. Our method does not differentiate between treating soft or hard responses, and consists of: 1) deprivatization: emulating the inverse mapping of the teacher function with a generator, and 2) distillation: aligning low-dimensional logits of the teacher and student models by reducing the distance of high-dimensional image points. For different teacher-student pairs, our method yields inspiring distillation performance on various benchmarks, and outperforms the previous state-of-the-art approaches.