论文标题

特洛伊神经网络的实际检测:无数据限制和无数据案例

Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases

论文作者

Wang, Ren, Zhang, Gaoyuan, Liu, Sijia, Chen, Pin-Yu, Xiong, Jinjun, Wang, Meng

论文摘要

当训练数据被恶意篡改时,可以通过称为特洛伊木马攻击的对手(或中毒后门攻击)来操纵所获得的深神经网络(DNN)的预测。 DNN对特洛伊木马攻击的缺乏稳健性可能会严重损害下游应用程序中现实生活中的机器学习(ML)系统,从而引起人们对其可信度的广泛关注。在本文中,我们研究了数据筛查制度中特洛伊木马网络(Trojannet)检测的问题,其中只有检测器才能访问经过训练的DNN的权重。我们首先提出了一个数据限制的Trojannet检测器(TND),仅当只有少数数据样本可用于trojannet检测时。我们表明,可以通过探索特洛伊木马攻击和预测 - 逃避对抗性攻击(包括每样本攻击以及​​全样本的通用攻击)之间的联系来建立有效的数据限制TND。此外,我们提出了一个无数据的TND,该TND可以检测Trojannet而无需访问任何数据示例。我们表明,可以通过利用隐藏神经元的内部响应来构建这样的TND,该响应即使在随机噪声输入下也表现出特洛伊木马的行为。通过在不同的模型架构和数据集(包括CIFAR-10,GTSRB和Imagenet)(GTSRB和Imagenet)(GTSRB和Imagenet)(ImageNet)中,通过广泛的实验评估我们的提案的有效性。

When the training data are maliciously tampered, the predictions of the acquired deep neural network (DNN) can be manipulated by an adversary known as the Trojan attack (or poisoning backdoor attack). The lack of robustness of DNNs against Trojan attacks could significantly harm real-life machine learning (ML) systems in downstream applications, therefore posing widespread concern to their trustworthiness. In this paper, we study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime, where only the weights of a trained DNN are accessed by the detector. We first propose a data-limited TrojanNet detector (TND), when only a few data samples are available for TrojanNet detection. We show that an effective data-limited TND can be established by exploring connections between Trojan attack and prediction-evasion adversarial attacks including per-sample attack as well as all-sample universal attack. In addition, we propose a data-free TND, which can detect a TrojanNet without accessing any data samples. We show that such a TND can be built by leveraging the internal response of hidden neurons, which exhibits the Trojan behavior even at random noise inputs. The effectiveness of our proposals is evaluated by extensive experiments under different model architectures and datasets including CIFAR-10, GTSRB, and ImageNet.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源