重组：生成分类器的等级 - 聚集集合，以进行健壮的预测

论文标题

重组：生成分类器的等级 - 聚集集合，以进行健壮的预测

REGroup: Rank-aggregating Ensemble of Generative Classifiers for Robust Predictions

论文作者

Tiwari, Lokender, Madan, Anish, Anand, Saket, Banerjee, Subhashis

论文摘要

深度神经网络（DNN）经常因容易受到对抗攻击而受到批评。大多数成功的防御策略采用了对抗性训练或随机输入转换，通常需要再进行重新调整或微调模型才能实现合理的性能。在这项工作中，我们对预训练的DNN的中间表示的调查导致了一个有趣的发现，指出了对对抗性攻击的内在鲁棒性。我们发现，我们可以通过统计地表征中间层的神经反应来清洁训练样本，从而学习生成分类器。多个此类基于中级的分类器的预测在汇总时会显示出对对抗性攻击的意外鲁棒性。具体而言，我们设计了这些生成分类器的集合，这些分类器通过基于Borda计数的共识来对其预测进行等级。我们提出的方法使用清洁训练数据的子集和预训练的模型，但对网络体系结构或对抗性攻击生成方法不可知。我们展示了广泛的实验，以确定我们的防御策略在ImageNet验证集上实现了最先进的性能。

Deep Neural Networks (DNNs) are often criticized for being susceptible to adversarial attacks. Most successful defense strategies adopt adversarial training or random input transformations that typically require retraining or fine-tuning the model to achieve reasonable performance. In this work, our investigations of intermediate representations of a pre-trained DNN lead to an interesting discovery pointing to intrinsic robustness to adversarial attacks. We find that we can learn a generative classifier by statistically characterizing the neural response of an intermediate layer to clean training samples. The predictions of multiple such intermediate-layer based classifiers, when aggregated, show unexpected robustness to adversarial attacks. Specifically, we devise an ensemble of these generative classifiers that rank-aggregates their predictions via a Borda count-based consensus. Our proposed approach uses a subset of the clean training data and a pre-trained model, and yet is agnostic to network architectures or the adversarial attack generation method. We show extensive experiments to establish that our defense strategy achieves state-of-the-art performance on the ImageNet validation set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题