测量样品级对抗性脆弱性及其在构建可信赖系统方面的实用性的整体方法

论文标题

测量样品级对抗性脆弱性及其在构建可信赖系统方面的实用性的整体方法

Holistic Approach to Measure Sample-level Adversarial Vulnerability and its Utility in Building Trustworthy Systems

论文作者

Nayak, Gaurav Kumar, Rawal, Ruchit, Lal, Rohit, Patil, Himanshu, Chakraborty, Anirban

论文摘要

对抗性攻击使图像具有无法察觉的噪声，从而导致模型预测不正确。最近，一些作品显示出与此类攻击（鲁棒性偏差）相关的固有偏见，其中数据集中的某些子组（例如基于阶级，性别等）不如其他攻击。即使经过对抗训练，这种偏见不仅存在，而且通常会导致这些亚组的严重绩效差异。现有的作品仅通过检查单个样本与决策边界的距离来表征亚组的鲁棒性偏差。在这项工作中，我们认为仅此措施就不够，并通过广泛的实验分析来验证我们的论点。已经观察到，对抗攻击通常会损坏输入图像的高频组成部分。因此，我们提出了一种整体方法，用于通过结合这些不同的观点，即模型对高频特征的依赖程度以及（常规的）样本距离对决策边界的依赖程度来量化样本的对抗脆弱性。我们证明，通过使用拟议的整体度量可以可靠地在样本级别估算对抗性脆弱性，可以开发一个值得信赖的系统，在该系统中，可以在其中通知人类有关在测试时很可能被错误分类的传入样本提醒。当我们的整体指标与单个措施一起使用时，这是更好的精度。为了进一步证实所提出的整体方法的实用性，我们在限量样本的环境中进行知识蒸馏。我们观察到，使用我们的合并度量的样本子集进行了训练的学生网络比竞争基线（即随机选择样本或基于其与决策边界的距离距离）更好。

Adversarial attack perturbs an image with an imperceptible noise, leading to incorrect model prediction. Recently, a few works showed inherent bias associated with such attack (robustness bias), where certain subgroups in a dataset (e.g. based on class, gender, etc.) are less robust than others. This bias not only persists even after adversarial training, but often results in severe performance discrepancies across these subgroups. Existing works characterize the subgroup's robustness bias by only checking individual sample's proximity to the decision boundary. In this work, we argue that this measure alone is not sufficient and validate our argument via extensive experimental analysis. It has been observed that adversarial attacks often corrupt the high-frequency components of the input image. We, therefore, propose a holistic approach for quantifying adversarial vulnerability of a sample by combining these different perspectives, i.e., degree of model's reliance on high-frequency features and the (conventional) sample-distance to the decision boundary. We demonstrate that by reliably estimating adversarial vulnerability at the sample level using the proposed holistic metric, it is possible to develop a trustworthy system where humans can be alerted about the incoming samples that are highly likely to be misclassified at test time. This is achieved with better precision when our holistic metric is used over individual measures. To further corroborate the utility of the proposed holistic approach, we perform knowledge distillation in a limited-sample setting. We observe that the student network trained with the subset of samples selected using our combined metric performs better than both the competing baselines, viz., where samples are selected randomly or based on their distances to the decision boundary.

下载PDF全文

下载文献需遵守相关版权规定

论文标题