论文标题
连接点:使用上下文不一致检测对抗性扰动
Connecting the Dots: Detecting Adversarial Perturbations Using Context Inconsistency
论文作者
论文摘要
最近在机器视觉中打败了深层神经网络(DNN)的对抗性扰动的研究激增。这些基于扰动的攻击中的大多数目标对象分类器。受到观察的启发,即人类能够识别出在场景中或其他不太可能对象的对象,我们使用一个系统来增强DNN,该系统在训练过程中学习上下文一致性规则并检查测试过程中违反相同的情况。我们的方法构建了一组自动编码器,每个对象类都进行了适当的训练,以便输出输入和输出之间的差异,如果附加的对抗性扰动违反了上下文一致性规则。 Pascal VOC和MS Coco的实验表明,我们的方法有效地检测了各种对抗性攻击并达到了高ROC-AUC(在大多数情况下,超过0.95);这对应于最先进的上下文不合SNOSTIC方法的20%以上的改善。
There has been a recent surge in research on adversarial perturbations that defeat Deep Neural Networks (DNNs) in machine vision; most of these perturbation-based attacks target object classifiers. Inspired by the observation that humans are able to recognize objects that appear out of place in a scene or along with other unlikely objects, we augment the DNN with a system that learns context consistency rules during training and checks for the violations of the same during testing. Our approach builds a set of auto-encoders, one for each object class, appropriately trained so as to output a discrepancy between the input and output if an added adversarial perturbation violates context consistency rules. Experiments on PASCAL VOC and MS COCO show that our method effectively detects various adversarial attacks and achieves high ROC-AUC (over 0.95 in most cases); this corresponds to over 20% improvement over a state-of-the-art context-agnostic method.