具有可证明的融合的型号针对模型的中毒攻击

论文标题

具有可证明的融合的型号针对模型的中毒攻击

Model-Targeted Poisoning Attacks with Provable Convergence

论文作者

Suya, Fnu, Mahloujifar, Saeed, Suri, Anshuman, Evans, David, Tian, Yuan

论文摘要

在中毒攻击中，对培训数据的一小部分控制的对手试图以一种诱发腐败的模型，而损坏的模型则不偏向对手。我们考虑对凸机学习模型的中毒攻击，并提出旨在诱导指定模型的有效中毒攻击。与以前针对模型的中毒攻击不同，我们的攻击具有可证明的融合到{\ IT}可达到的目标分类器。从诱导分类器到目标分类器的距离与中毒点数量的平方根成反比。我们还提供了实现给定目标分类器所需的最小中毒点数量的下限。我们的方法使用在线凸优化，因此可以逐步发现中毒点。这比先前的攻击提供了更大的灵活性，这些攻击需要先验假设中毒点的数量。我们的攻击是第一次针对模型的中毒攻击，它为凸模型提供了可证明的收敛性，在我们的实验中，它超过或匹配了最先进的攻击，就攻击成功率和与目标模型的距离而言。

In a poisoning attack, an adversary with control over a small fraction of the training data attempts to select that data in a way that induces a corrupted model that misbehaves in favor of the adversary. We consider poisoning attacks against convex machine learning models and propose an efficient poisoning attack designed to induce a specified model. Unlike previous model-targeted poisoning attacks, our attack comes with provable convergence to {\it any} attainable target classifier. The distance from the induced classifier to the target classifier is inversely proportional to the square root of the number of poisoning points. We also provide a lower bound on the minimum number of poisoning points needed to achieve a given target classifier. Our method uses online convex optimization, so finds poisoning points incrementally. This provides more flexibility than previous attacks which require a priori assumption about the number of poisoning points. Our attack is the first model-targeted poisoning attack that provides provable convergence for convex models, and in our experiments, it either exceeds or matches state-of-the-art attacks in terms of attack success rate and distance to the target model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题