论文标题
您的敏感属性是私人的吗?新型模型反转属性推理攻击分类模型
Are Your Sensitive Attributes Private? Novel Model Inversion Attribute Inference Attacks on Classification Models
论文作者
论文摘要
在医学诊断,生活方式预测和业务决策等隐私敏感领域中对机器学习(ML)技术的使用越来越多,这突显了更好地了解这些ML技术是否正在引入敏感和专有培训数据的泄漏。在本文中,我们专注于模型反转攻击,在这些攻击中,对手知道培训数据中的记录的不敏感属性,并旨在推断对手未知的敏感属性的价值,仅使用对目标分类模型的black-box访问。我们首先设计了一种基于置信度得分的新型模型反转属性推理攻击,该攻击表现明显优于最先进的攻击。然后,我们引入了仅依赖于模型的预测标签的仅标签模型反转攻击,但仍与我们基于置信度得分的攻击相匹配。我们还将攻击扩展到了目标记录的其他某些(非敏感)属性的情况下,对手未知。我们评估了对两种机器学习模型,决策树和深神经网络的攻击,该模型在三个真实数据集上进行了培训。此外,我们从经验上证明了模型反转攻击的不同脆弱性,即训练数据集中的特定组(由性别,种族等)可能更容易受到模型反转攻击的影响。
Increasing use of machine learning (ML) technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakage of sensitive and proprietary training data. In this paper, we focus on model inversion attacks where the adversary knows non-sensitive attributes about records in the training data and aims to infer the value of a sensitive attribute unknown to the adversary, using only black-box access to the target classification model. We first devise a novel confidence score-based model inversion attribute inference attack that significantly outperforms the state-of-the-art. We then introduce a label-only model inversion attack that relies only on the model's predicted labels but still matches our confidence score-based attack in terms of attack effectiveness. We also extend our attacks to the scenario where some of the other (non-sensitive) attributes of a target record are unknown to the adversary. We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained on three real datasets. Moreover, we empirically demonstrate the disparate vulnerability of model inversion attacks, i.e., specific groups in the training dataset (grouped by gender, race, etc.) could be more vulnerable to model inversion attacks.