联合知识蒸馏

论文标题

联合知识蒸馏

Federated Knowledge Distillation

论文作者

Seo, Hyowoon, Park, Jihong, Oh, Seungeun, Bennis, Mehdi, Kim, Seong-Lyun

论文摘要

分布式学习框架通常依赖于跨工人交换模型参数，而不是透露其原始数据。一个主要的例子是联合学习，可以交换每个神经网络模型的梯度或权重。但是，在有限的通信资源下，这种方法变得极为昂贵，尤其是对于具有大量模型参数的现代深度神经网络而言。在这方面，联合蒸馏（FD）是一种引人注目的分布式学习解决方案，它仅交换尺寸通常比模型大小小得多的模型输出（例如，MNIST数据集中的10个标签）。本章的目的是提供对FD的深入了解，同时证明其沟通效率和对各种任务的适用性。为此，为了揭开FD的操作原理的神秘面纱，本章的第一部分通过利用神经涉及核心（NTK）的理论来利用FD的两种基础算法（即FD的基础算法）提供新的渐近分析。接下来，第二部分详细阐述了FD的基线实现，以进行分类任务，并说明了其与FL相比的准确性和沟通效率方面的性能。最后，为了证明FD对各种分布式学习任务和环境的适用性，第三部分介绍了两个选定的应用程序，即FD上的FD超过了非对称的上行链路和downlink Wireless Channel，而FD则用于增强学习。

Distributed learning frameworks often rely on exchanging model parameters across workers, instead of revealing their raw data. A prime example is federated learning that exchanges the gradients or weights of each neural network model. Under limited communication resources, however, such a method becomes extremely costly particularly for modern deep neural networks having a huge number of model parameters. In this regard, federated distillation (FD) is a compelling distributed learning solution that only exchanges the model outputs whose dimensions are commonly much smaller than the model sizes (e.g., 10 labels in the MNIST dataset). The goal of this chapter is to provide a deep understanding of FD while demonstrating its communication efficiency and applicability to a variety of tasks. To this end, towards demystifying the operational principle of FD, the first part of this chapter provides a novel asymptotic analysis for two foundational algorithms of FD, namely knowledge distillation (KD) and co-distillation (CD), by exploiting the theory of neural tangent kernel (NTK). Next, the second part elaborates on a baseline implementation of FD for a classification task, and illustrates its performance in terms of accuracy and communication efficiency compared to FL. Lastly, to demonstrate the applicability of FD to various distributed learning tasks and environments, the third part presents two selected applications, namely FD over asymmetric uplink-and-downlink wireless channels and FD for reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题