论文标题
对抗性示例是否相等?在非均匀攻击下可学习的加权最小值鲁棒性的风险
Are Adversarial Examples Created Equal? A Learnable Weighted Minimax Risk for Robustness under Non-uniform Attacks
论文作者
论文摘要
事实证明,对抗训练是一种防御对抗性例子的有效方法,是承受强烈攻击的少数防御能力之一。但是,根据基础数据分布,传统的防御机制对示例进行了统一的攻击,这显然是不现实的,因为攻击者可以选择专注于更脆弱的例子。我们提出了一个加权的最小风险优化,以防御非均匀的攻击,从而在扰动的测试数据分布下针对对抗性实例实现了鲁棒性。我们修改的风险考虑了不同对抗性示例的重要权重,并专注于错误分类或不正确地分类的较高风险的更艰难的例子。设计的风险允许训练过程通过优化重要性权重来学习强大的防御。该实验表明,我们的模型在不均匀的攻击下显着提高了最新的对抗精度,而在均匀攻击下没有显着下降。
Adversarial Training is proved to be an efficient method to defend against adversarial examples, being one of the few defenses that withstand strong attacks. However, traditional defense mechanisms assume a uniform attack over the examples according to the underlying data distribution, which is apparently unrealistic as the attacker could choose to focus on more vulnerable examples. We present a weighted minimax risk optimization that defends against non-uniform attacks, achieving robustness against adversarial examples under perturbed test data distributions. Our modified risk considers importance weights of different adversarial examples and focuses adaptively on harder examples that are wrongly classified or at higher risk of being classified incorrectly. The designed risk allows the training process to learn a strong defense through optimizing the importance weights. The experiments show that our model significantly improves state-of-the-art adversarial accuracy under non-uniform attacks without a significant drop under uniform attacks.