通过抑制模型快捷方式的后门防御

论文标题

通过抑制模型快捷方式的后门防御

Backdoor Defense via Suppressing Model Shortcuts

论文作者

Yang, Sheng, Li, Yiming, Jiang, Yong, Xia, Shu-Tao

论文摘要

最近的研究表明，在训练过程中，深层神经网络（DNN）容易受到后门攻击的影响。具体而言，对手打算将隐藏的后门嵌入DNNS中，以便可以通过预定义的触发模式激活恶意模型预测。在本文中，我们从模型结构的角度探讨了后门机制。我们选择跳过连接进行讨论，灵感来自于这样的理解，即它有助于学习模型“快捷方式”，而后门触发器通常更容易学习。具体而言，我们证明攻击成功率（ASR）在降低某些关键跳过连接的输出时会大大降低。基于此观察结果，我们通过抑制我们方法选择的关键层中的跳过连接来设计一种简单而有效的后门拆卸方法。我们还对这些层进行微调，以恢复高良性准确性并进一步降低ASR。基准数据集的广泛实验验证了我们方法的有效性。

Recent studies have demonstrated that deep neural networks (DNNs) are vulnerable to backdoor attacks during the training process. Specifically, the adversaries intend to embed hidden backdoors in DNNs so that malicious model predictions can be activated through pre-defined trigger patterns. In this paper, we explore the backdoor mechanism from the angle of the model structure. We select the skip connection for discussions, inspired by the understanding that it helps the learning of model `shortcuts' where backdoor triggers are usually easier to be learned. Specifically, we demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections. Based on this observation, we design a simple yet effective backdoor removal method by suppressing the skip connections in critical layers selected by our method. We also implement fine-tuning on these layers to recover high benign accuracy and to further reduce ASR. Extensive experiments on benchmark datasets verify the effectiveness of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题