论文标题
细粒度的私人知识蒸馏
Fine-grained Private Knowledge Distillation
论文作者
论文摘要
知识蒸馏已成为一种可扩展有效的机器学习方法。剩下的缺点是,它以模型级别(即客户级)方式消耗隐私,每种蒸馏查询都会损失一个客户的所有记录的隐私损失。为了获得细粒度的隐私会计师并改善实用程序,这项工作提出了一种无模型的反向$ k $ -nn标签方法,用于记录级别的私人知识蒸馏,其中每个记录都用于最多$ k $ queries的标签。从理论上讲,我们提供了标记错误率的界限,该标记错误率是差异隐私的集中/本地/洗牌模型(W.R.T.每个查询的记录数,隐私预算)。在实验上,我们证明了它可以实现新的最新准确性,而隐私损失的一个数量级。具体而言,在CIFAR- $ 10 $数据集上,它达到$ 82.1 \%$ $测试准确性,而集中隐私预算$ 1.0 $;在MNIST/SVHN数据集上,它分别达到$ 99.1 \%$/$ 95.6 \%$ $精度,预算$ 0.1 $。这是首次使用合理的数据隐私保护(即$ \ exp(ε)\ leq 1.5 $)的深度学习实现可比精度。我们的代码可在https://github.com/liyuntong9/rknn上找到。
Knowledge distillation has emerged as a scalable and effective way for privacy-preserving machine learning. One remaining drawback is that it consumes privacy in a model-level (i.e., client-level) manner, every distillation query incurs privacy loss of one client's all records. In order to attain fine-grained privacy accountant and improve utility, this work proposes a model-free reverse $k$-NN labeling method towards record-level private knowledge distillation, where each record is employed for labeling at most $k$ queries. Theoretically, we provide bounds of labeling error rate under the centralized/local/shuffle model of differential privacy (w.r.t. the number of records per query, privacy budgets). Experimentally, we demonstrate that it achieves new state-of-the-art accuracy with one order of magnitude lower of privacy loss. Specifically, on the CIFAR-$10$ dataset, it reaches $82.1\%$ test accuracy with centralized privacy budget $1.0$; on the MNIST/SVHN dataset, it reaches $99.1\%$/$95.6\%$ accuracy respectively with budget $0.1$. It is the first time deep learning with differential privacy achieve comparable accuracy with reasonable data privacy protection (i.e., $\exp(ε)\leq 1.5$). Our code is available at https://github.com/liyuntong9/rknn.