对垂直联合学习的实际隐私攻击

论文标题

对垂直联合学习的实际隐私攻击

Practical Privacy Attacks on Vertical Federated Learning

论文作者

Weng, Haiqin, Zhang, Juntao, Ma, Xingjun, Xue, Feng, Wei, Tao, Ji, Shouling, Zong, Zhiyuan

论文摘要

联合学习（FL）是一个保护隐私的学习范式，允许多个奇偶族共同训练强大的机器学习模型而无需共享私人数据。根据协作的形式，FL可以进一步分为水平联合学习（HFL）和垂直联合学习（VFL）。在HFL中，参与者共享相同的功能空间并在数据样本上进行协作，而在VFL中，参与者共享相同的示例ID并在功能上进行协作。 VFL具有更广泛的应用程序范围，并且可以说更适合大型企业之间的联合模型培训。在本文中，我们专注于VFL并研究现实世界VFL框架中的潜在隐私泄漏。我们设计并实施了两个实用的隐私攻击：逻辑回归VFL协议的反向乘法攻击；和XGBoost VFL协议的反向总和攻击。我们从经验上表明，这两次攻击是（1）有效的 - 即使对中间输出进行加密以保护数据隐私，对手也可以成功窃取私人培训数据；（2）回避 - 攻击不会偏离协议规范，也不会降低目标模型的准确性；（3）简单 - 对手几乎不需要关于目标参与者数据分布的先验知识。我们还显示，泄漏的信息与培训替代分类器中的原始培训数据一样有效。我们进一步讨论了潜在的对策及其挑战，我们希望这可以导致一些有前途的研究指示。

Federated learning (FL) is a privacy-preserving learning paradigm that allows multiple parities to jointly train a powerful machine learning model without sharing their private data. According to the form of collaboration, FL can be further divided into horizontal federated learning (HFL) and vertical federated learning (VFL). In HFL, participants share the same feature space and collaborate on data samples, while in VFL, participants share the same sample IDs and collaborate on features. VFL has a broader scope of applications and is arguably more suitable for joint model training between large enterprises. In this paper, we focus on VFL and investigate potential privacy leakage in real-world VFL frameworks. We design and implement two practical privacy attacks: reverse multiplication attack for the logistic regression VFL protocol; and reverse sum attack for the XGBoost VFL protocol. We empirically show that the two attacks are (1) effective - the adversary can successfully steal the private training data, even when the intermediate outputs are encrypted to protect data privacy; (2) evasive - the attacks do not deviate from the protocol specification nor deteriorate the accuracy of the target model; and (3) easy - the adversary needs little prior knowledge about the data distribution of the target participant. We also show the leaked information is as effective as the raw training data in training an alternative classifier. We further discuss potential countermeasures and their challenges, which we hope can lead to several promising research directions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题