细粒度的私人知识蒸馏

论文标题

细粒度的私人知识蒸馏

Fine-grained Private Knowledge Distillation

论文作者

Li, Yuntong, Wang, Shaowei, Wang, Yingying, Li, Jin, Qian, Yuqiu, Xin, Bangzhou, Yang, Wei

论文摘要

知识蒸馏已成为一种可扩展有效的机器学习方法。剩下的缺点是，它以模型级别（即客户级）方式消耗隐私，每种蒸馏查询都会损失一个客户的所有记录的隐私损失。为了获得细粒度的隐私会计师并改善实用程序，这项工作提出了一种无模型的反向$ k $ -nn标签方法，用于记录级别的私人知识蒸馏，其中每个记录都用于最多$ k $ queries的标签。从理论上讲，我们提供了标记错误率的界限，该标记错误率是差异隐私的集中/本地/洗牌模型（W.R.T.每个查询的记录数，隐私预算）。在实验上，我们证明了它可以实现新的最新准确性，而隐私损失的一个数量级。具体而言，在CIFAR- $ 10 $数据集上，它达到$ 82.1 \％$ $测试准确性，而集中隐私预算$ 1.0 $;在MNIST/SVHN数据集上，它分别达到$ 99.1 \％$/$ 95.6 \％$ $精度，预算$ 0.1 $。这是首次使用合理的数据隐私保护（即$ \ exp（ε）\ leq 1.5 $）的深度学习实现可比精度。我们的代码可在https://github.com/liyuntong9/rknn上找到。

Knowledge distillation has emerged as a scalable and effective way for privacy-preserving machine learning. One remaining drawback is that it consumes privacy in a model-level (i.e., client-level) manner, every distillation query incurs privacy loss of one client's all records. In order to attain fine-grained privacy accountant and improve utility, this work proposes a model-free reverse $k$-NN labeling method towards record-level private knowledge distillation, where each record is employed for labeling at most $k$ queries. Theoretically, we provide bounds of labeling error rate under the centralized/local/shuffle model of differential privacy (w.r.t. the number of records per query, privacy budgets). Experimentally, we demonstrate that it achieves new state-of-the-art accuracy with one order of magnitude lower of privacy loss. Specifically, on the CIFAR-$10$ dataset, it reaches $82.1\%$ test accuracy with centralized privacy budget $1.0$; on the MNIST/SVHN dataset, it reaches $99.1\%$/$95.6\%$ accuracy respectively with budget $0.1$. It is the first time deep learning with differential privacy achieve comparable accuracy with reasonable data privacy protection (i.e., $\exp(ε)\leq 1.5$). Our code is available at https://github.com/liyuntong9/rknn.

下载PDF全文

下载文献需遵守相关版权规定

论文标题