后门数据中毒对图像分类器的系统评估

论文标题

后门数据中毒对图像分类器的系统评估

Systematic Evaluation of Backdoor Data Poisoning Attacks on Image Classifiers

论文作者

Truong, Loc, Jones, Chace, Hutchinson, Brian, August, Andrew, Praggastis, Brenda, Jasper, Robert, Nichols, Nicole, Tuor, Aaron

论文摘要

最近在计算机视觉研究中证明了后门数据中毒攻击是机器学习（ML）系统的潜在安全风险。传统的数据中毒攻击操纵训练数据以诱导ML模型的不可靠性，而后门数据中毒攻击保持系统性能，除非将ML模型呈现，其中包含包含嵌入式“触发”的输入，该输入提供了对对手的预定响应。 Our work builds upon prior backdoor data-poisoning research for ML image classifiers and systematically assesses different experimental conditions including types of trigger patterns, persistence of trigger patterns during retraining, poisoning strategies, architectures (ResNet-50, NasNet, NasNet-Mobile), datasets (Flowers, CIFAR-10), and potential defensive regularization techniques (Contrastive Loss, Logit Squeezing, Manifold混合，柔软的邻居损失）。实验产生四个关键发现。首先，后门中毒攻击的成功率差异很大，具体取决于几个因素，包括模型架构，触发模式和正则化技术。其次，我们发现仅通过性能检查就很难检测到中毒的模型。第三，正则化通常会降低后门的成功率，尽管它没有影响甚至略微增加，具体取决于正规化的形式。最后，在仅几个少数几个少量的清洁数据培训而不会影响模型的性能的情况下，通过数据中毒插入的后门可以使其无效。

Backdoor data poisoning attacks have recently been demonstrated in computer vision research as a potential safety risk for machine learning (ML) systems. Traditional data poisoning attacks manipulate training data to induce unreliability of an ML model, whereas backdoor data poisoning attacks maintain system performance unless the ML model is presented with an input containing an embedded "trigger" that provides a predetermined response advantageous to the adversary. Our work builds upon prior backdoor data-poisoning research for ML image classifiers and systematically assesses different experimental conditions including types of trigger patterns, persistence of trigger patterns during retraining, poisoning strategies, architectures (ResNet-50, NasNet, NasNet-Mobile), datasets (Flowers, CIFAR-10), and potential defensive regularization techniques (Contrastive Loss, Logit Squeezing, Manifold Mixup, Soft-Nearest-Neighbors Loss). Experiments yield four key findings. First, the success rate of backdoor poisoning attacks varies widely, depending on several factors, including model architecture, trigger pattern and regularization technique. Second, we find that poisoned models are hard to detect through performance inspection alone. Third, regularization typically reduces backdoor success rate, although it can have no effect or even slightly increase it, depending on the form of regularization. Finally, backdoors inserted through data poisoning can be rendered ineffective after just a few epochs of additional training on a small set of clean data without affecting the model's performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题