正交深度模型作为防御黑盒攻击的防御

论文标题

正交深度模型作为防御黑盒攻击的防御

Orthogonal Deep Models As Defense Against Black-Box Attacks

论文作者

Jalwana, Mohammad A. A. K., Akhtar, Naveed, Bennamoun, Mohammed, Mian, Ajmal

论文摘要

深度学习已经证明了针对各种具有挑战性的计算机视觉任务的最新性能。一方面，这使得深层视觉模型为诸如疾病预后和智能监视之类的大量关键应用铺平了道路。另一方面，还发现深度学习容易受到对抗性攻击的攻击，该攻击要求新技术捍卫深层模型，以防止这些攻击。在攻击算法中，黑箱方案是严重的实际问题，因为它们只需要对目标模型的公开知识。我们仔细分析了黑框设置中深模型的固有弱点，在该设置中，攻击者可以使用类似于目标模型的模型来开发攻击。基于我们的分析，我们介绍了一种新型的梯度正则化方案，该方案鼓励深层模型的内部表示与另一个模型正交，即使这两个模型的架构相似。我们的独特约束使模型可以同时努力，以提高准确性，同时相对于参考模型保持梯度的接近正交比对。详细的经验研究验证了控制我们正交物体下梯度的未对准的未对准会显着提高模型对可转移的黑盒对抗攻击的鲁棒性。与常规型号相比，正交模型对于$ L_P $ norm Bound的扰动的范围明显更强。我们验证技术对各种大型模型的有效性。

Deep learning has demonstrated state-of-the-art performance for a variety of challenging computer vision tasks. On one hand, this has enabled deep visual models to pave the way for a plethora of critical applications like disease prognostics and smart surveillance. On the other, deep learning has also been found vulnerable to adversarial attacks, which calls for new techniques to defend deep models against these attacks. Among the attack algorithms, the black-box schemes are of serious practical concern since they only need publicly available knowledge of the targeted model. We carefully analyze the inherent weakness of deep models in black-box settings where the attacker may develop the attack using a model similar to the targeted model. Based on our analysis, we introduce a novel gradient regularization scheme that encourages the internal representation of a deep model to be orthogonal to another, even if the architectures of the two models are similar. Our unique constraint allows a model to concomitantly endeavour for higher accuracy while maintaining near orthogonal alignment of gradients with respect to a reference model. Detailed empirical study verifies that controlled misalignment of gradients under our orthogonality objective significantly boosts a model's robustness against transferable black-box adversarial attacks. In comparison to regular models, the orthogonal models are significantly more robust to a range of $l_p$ norm bounded perturbations. We verify the effectiveness of our technique on a variety of large-scale models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题