论文标题
两级和多攻击方案的后门攻击后训练后检测
Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios
论文作者
论文摘要
后门攻击(BAS)是对深神经网络分类器的新威胁。每当将测试样本嵌入相同的后门图案(BP)时,受害者分类器将预测攻击者呈现的目标类别,用于毒化分类器的训练集。在实践中检测分类器是否受到后门攻击并不容易,尤其是当防守者是下游用户而无需访问分类器的训练集时。反向工程防御(RED)在这里解决了这一挑战,该防御已被证明在几个领域中产生最先进的表现。但是,当仅有{\ it两类}或存在{\ it多个攻击}时,现有的红色不适用。这些方案首先是在当前论文中研究的,这是在辩护人既无法访问分类器训练集也不是从对同一域进行培训的清洁参考分类器进行监督的实际约束。我们提出了一个基于BP反向工程的检测框架,并提出了一个新颖的{\预期可传递性}(ET)统计量。我们表明,使用相同的检测阈值},我们的ET统计量是有效的,无论分类域,攻击配置以及使用的BP反向工程算法如何,它。我们的方法的出色性能在六个基准数据集中证明。值得注意的是,我们的检测框架也适用于具有多次攻击的多级场景。代码可从https://github.com/zhenxianglance/2classbadetection获得。
Backdoor attacks (BAs) are an emerging threat to deep neural network classifiers. A victim classifier will predict to an attacker-desired target class whenever a test sample is embedded with the same backdoor pattern (BP) that was used to poison the classifier's training set. Detecting whether a classifier is backdoor attacked is not easy in practice, especially when the defender is, e.g., a downstream user without access to the classifier's training set. This challenge is addressed here by a reverse-engineering defense (RED), which has been shown to yield state-of-the-art performance in several domains. However, existing REDs are not applicable when there are only {\it two classes} or when {\it multiple attacks} are present. These scenarios are first studied in the current paper, under the practical constraints that the defender neither has access to the classifier's training set nor to supervision from clean reference classifiers trained for the same domain. We propose a detection framework based on BP reverse-engineering and a novel {\it expected transferability} (ET) statistic. We show that our ET statistic is effective {\it using the same detection threshold}, irrespective of the classification domain, the attack configuration, and the BP reverse-engineering algorithm that is used. The excellent performance of our method is demonstrated on six benchmark datasets. Notably, our detection framework is also applicable to multi-class scenarios with multiple attacks. Code is available at https://github.com/zhenxianglance/2ClassBADetection.