通过预先训练的深度学习模型反对转移学习的后门攻击

论文标题

通过预先训练的深度学习模型反对转移学习的后门攻击

Backdoor Attacks against Transfer Learning with Pre-trained Deep Learning Models

论文作者

Wang, Shuo, Nepal, Surya, Rudolph, Carsten, Grobler, Marthie, Chen, Shangyu, Chen, Tianle

论文摘要

传输学习通过通过微调通过微型数据集将预先训练的\ textit {thoctit {thoctit {thoctit {thoctit {thoctit {gothopit {thockit {textit {tocultit {student {student {student {student {student {student {student {student {student {student {student {student {student {student {student}模型提供有效解决方案。公共平台公开可用和维护了许多用于转移学习的预训练的教师模型，从而增加了其对后门攻击的脆弱性。 In this paper, we demonstrate a backdoor threat to transfer learning tasks on both image and time-series data leveraging the knowledge of publicly accessible Teacher models, aimed at defeating three commonly-adopted defenses: \textit{pruning-based}, \textit{retraining-based} and \textit{input pre-processing-based defenses}.具体而言，（a）基于排名的选择机制，以加快后门触发器的生成和扰动过程，同时击败\ textit {基于修剪的}和/或\ textit {基于retraning的防御}。（b）提出了自动编码器驱动的触发器生成，以产生可击败\ textIt {基于预处理的防御}的强大触发器，同时确保可以显着激活所选神经元。（c）使用反向工程模型输入来生成操纵模型的防御意识重新培训。我们通过现实世界图像，大脑磁共振成像（MRI）数据和心电图（ECG）学习系统对学生模型发起有效的错误分类攻击。 The experiments reveal that our enhanced attack can maintain the $98.4\%$ and $97.2\%$ classification accuracy as the genuine model on clean image and time series inputs respectively while improving $27.9\%-100\%$ and $27.1\%-56.1\%$ attack success rate on trojaned image and time series inputs respectively in the presence of pruning-based and/or retraining-based防御。

Transfer learning provides an effective solution for feasibly and fast customize accurate \textit{Student} models, by transferring the learned knowledge of pre-trained \textit{Teacher} models over large datasets via fine-tuning. Many pre-trained Teacher models used in transfer learning are publicly available and maintained by public platforms, increasing their vulnerability to backdoor attacks. In this paper, we demonstrate a backdoor threat to transfer learning tasks on both image and time-series data leveraging the knowledge of publicly accessible Teacher models, aimed at defeating three commonly-adopted defenses: \textit{pruning-based}, \textit{retraining-based} and \textit{input pre-processing-based defenses}. Specifically, (A) ranking-based selection mechanism to speed up the backdoor trigger generation and perturbation process while defeating \textit{pruning-based} and/or \textit{retraining-based defenses}. (B) autoencoder-powered trigger generation is proposed to produce a robust trigger that can defeat the \textit{input pre-processing-based defense}, while guaranteeing that selected neuron(s) can be significantly activated. (C) defense-aware retraining to generate the manipulated model using reverse-engineered model inputs. We launch effective misclassification attacks on Student models over real-world images, brain Magnetic Resonance Imaging (MRI) data and Electrocardiography (ECG) learning systems. The experiments reveal that our enhanced attack can maintain the $98.4\%$ and $97.2\%$ classification accuracy as the genuine model on clean image and time series inputs respectively while improving $27.9\%-100\%$ and $27.1\%-56.1\%$ attack success rate on trojaned image and time series inputs respectively in the presence of pruning-based and/or retraining-based defenses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题