豆腐：通过将重量更新编码为代理数据的梯度来混淆联合更新

论文标题

豆腐：通过将重量更新编码为代理数据的梯度来混淆联合更新

TOFU: Towards Obfuscated Federated Updates by Encoding Weight Updates into Gradients from Proxy Data

论文作者

Garg, Isha, Nagaraj, Manish, Roy, Kaushik

论文摘要

联合学习和大量用户数据的进步已使多个客户之间的丰富协作学习无需共享用户数据。这是通过中央服务器以重量更新形式汇总学习的中央服务器完成的。但是，这是以客户与服务器之间反复昂贵通信的代价的，并且担心用户隐私受损。将梯度的反转为生成它们的数据被称为数据泄漏。加密技术可用于抵消此泄漏，但要增加费用。为了应对沟通效率和隐私的这些挑战，我们提出了豆腐，这是一种新型算法，该算法生成代理数据，该数据编码每个客户端以其梯度的重量更新。现在共享此代理数据，而不是重量更新。由于输入数据的尺寸复杂性要比权重远低得多，因此该编码使我们每回合的数据要少得多。此外，代理数据类似于噪声，甚至来自数据泄漏攻击的完美重建也会将解码的梯度转变为无法识别的噪声，从而增强隐私。我们表明，豆腐使MNIST和CIFAR-10数据集的精度下降少于1％和7％。可以通过几轮昂贵的加密梯度交换来恢复此下降。这使我们能够在联合设置中学习近乎满足的准确性，而与MNIST和CIFAR-10的标准平均算法相比，沟通效率高4倍和6.6倍。

Advances in Federated Learning and an abundance of user data have enabled rich collaborative learning between multiple clients, without sharing user data. This is done via a central server that aggregates learning in the form of weight updates. However, this comes at the cost of repeated expensive communication between the clients and the server, and concerns about compromised user privacy. The inversion of gradients into the data that generated them is termed data leakage. Encryption techniques can be used to counter this leakage, but at added expense. To address these challenges of communication efficiency and privacy, we propose TOFU, a novel algorithm which generates proxy data that encodes the weight updates for each client in its gradients. Instead of weight updates, this proxy data is now shared. Since input data is far lower in dimensional complexity than weights, this encoding allows us to send much lesser data per communication round. Additionally, the proxy data resembles noise, and even perfect reconstruction from data leakage attacks would invert the decoded gradients into unrecognizable noise, enhancing privacy. We show that TOFU enables learning with less than 1% and 7% accuracy drops on MNIST and on CIFAR-10 datasets, respectively. This drop can be recovered via a few rounds of expensive encrypted gradient exchange. This enables us to learn to near-full accuracy in a federated setup, while being 4x and 6.6x more communication efficient than the standard Federated Averaging algorithm on MNIST and CIFAR-10, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题