使用隐私引导培训降低模型反转的风险

论文标题

使用隐私引导培训降低模型反转的风险

Reducing Risk of Model Inversion Using Privacy-Guided Training

论文作者

Goldsteen, Abigail, Ezov, Gilad, Farkash, Ariel

论文摘要

机器学习模型通常会威胁到数据属于培训集的个人的隐私。最近的一些攻击能够从训练有素的模型中推断敏感信息，包括模型反转或属性推理攻击。这些攻击能够揭示参与培训模型的个人的某些敏感特征的价值。还表明，几个因素可以导致模型反转的风险增加，包括特征影响。我们观察到，并非所有功能都一定具有相同的隐私或灵敏度。在许多情况下，用于训练模型的某些功能被认为是特别敏感的，因此有利于倒置的候选者。我们提出了一种解决基于树模型中模型反转攻击的解决方案，通过减少这些模型中敏感特征的影响。这是一条尚未对其进行彻底研究的途径，仅此前的尝试就非常新生，试图将其用作属性推断的对策。我们的工作表明，在许多情况下，可以以不同的方式训练模型，从而导致不同特征的影响不同，而不必损害模型的准确性。我们能够利用这一事实来以减少模型对最敏感特征的依赖的方式进行训练模型，同时增加敏感性较低的功能的重要性。我们的评估证实，以这种方式培训模型降低了这些特征的推断风险，如多次黑盒和白色盒子攻击所证明的那样。

Machine learning models often pose a threat to the privacy of individuals whose data is part of the training set. Several recent attacks have been able to infer sensitive information from trained models, including model inversion or attribute inference attacks. These attacks are able to reveal the values of certain sensitive features of individuals who participated in training the model. It has also been shown that several factors can contribute to an increased risk of model inversion, including feature influence. We observe that not all features necessarily share the same level of privacy or sensitivity. In many cases, certain features used to train a model are considered especially sensitive and therefore propitious candidates for inversion. We present a solution for countering model inversion attacks in tree-based models, by reducing the influence of sensitive features in these models. This is an avenue that has not yet been thoroughly investigated, with only very nascent previous attempts at using this as a countermeasure against attribute inference. Our work shows that, in many cases, it is possible to train a model in different ways, resulting in different influence levels of the various features, without necessarily harming the model's accuracy. We are able to utilize this fact to train models in a manner that reduces the model's reliance on the most sensitive features, while increasing the importance of less sensitive features. Our evaluation confirms that training models in this manner reduces the risk of inference for those features, as demonstrated through several black-box and white-box attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题