论文标题
通过预先训练的深度学习模型反对转移学习的后门攻击
Backdoor Attacks against Transfer Learning with Pre-trained Deep Learning Models
论文作者
论文摘要
传输学习通过通过微调通过微型数据集将预先训练的\ textit {thoctit {thoctit {thoctit {thoctit {thoctit {gothopit {thockit {textit {tocultit {student {student {student {student {student {student {student {student {student {student {student {student {student {student {student}模型提供有效解决方案。公共平台公开可用和维护了许多用于转移学习的预训练的教师模型,从而增加了其对后门攻击的脆弱性。 In this paper, we demonstrate a backdoor threat to transfer learning tasks on both image and time-series data leveraging the knowledge of publicly accessible Teacher models, aimed at defeating three commonly-adopted defenses: \textit{pruning-based}, \textit{retraining-based} and \textit{input pre-processing-based defenses}.具体而言,(a)基于排名的选择机制,以加快后门触发器的生成和扰动过程,同时击败\ textit {基于修剪的}和/或\ textit {基于retraning的防御}。 (b)提出了自动编码器驱动的触发器生成,以产生可击败\ textIt {基于预处理的防御}的强大触发器,同时确保可以显着激活所选神经元。 (c)使用反向工程模型输入来生成操纵模型的防御意识重新培训。 我们通过现实世界图像,大脑磁共振成像(MRI)数据和心电图(ECG)学习系统对学生模型发起有效的错误分类攻击。 The experiments reveal that our enhanced attack can maintain the $98.4\%$ and $97.2\%$ classification accuracy as the genuine model on clean image and time series inputs respectively while improving $27.9\%-100\%$ and $27.1\%-56.1\%$ attack success rate on trojaned image and time series inputs respectively in the presence of pruning-based and/or retraining-based防御。
Transfer learning provides an effective solution for feasibly and fast customize accurate \textit{Student} models, by transferring the learned knowledge of pre-trained \textit{Teacher} models over large datasets via fine-tuning. Many pre-trained Teacher models used in transfer learning are publicly available and maintained by public platforms, increasing their vulnerability to backdoor attacks. In this paper, we demonstrate a backdoor threat to transfer learning tasks on both image and time-series data leveraging the knowledge of publicly accessible Teacher models, aimed at defeating three commonly-adopted defenses: \textit{pruning-based}, \textit{retraining-based} and \textit{input pre-processing-based defenses}. Specifically, (A) ranking-based selection mechanism to speed up the backdoor trigger generation and perturbation process while defeating \textit{pruning-based} and/or \textit{retraining-based defenses}. (B) autoencoder-powered trigger generation is proposed to produce a robust trigger that can defeat the \textit{input pre-processing-based defense}, while guaranteeing that selected neuron(s) can be significantly activated. (C) defense-aware retraining to generate the manipulated model using reverse-engineered model inputs. We launch effective misclassification attacks on Student models over real-world images, brain Magnetic Resonance Imaging (MRI) data and Electrocardiography (ECG) learning systems. The experiments reveal that our enhanced attack can maintain the $98.4\%$ and $97.2\%$ classification accuracy as the genuine model on clean image and time series inputs respectively while improving $27.9\%-100\%$ and $27.1\%-56.1\%$ attack success rate on trojaned image and time series inputs respectively in the presence of pruning-based and/or retraining-based defenses.