论文标题
利用对抗距离发现高信心错误
Harnessing Adversarial Distances to Discover High-Confidence Errors
论文作者
论文摘要
鉴于我们将其视为黑匣子的深度神经网络图像分类模型以及一个未标记的评估数据集,我们制定了一个有效的策略,可以通过该策略进行评估。从未标记的评估数据集中随机采样和标记实例可以估算传统的绩效指标,例如准确性,精度和召回率。但是,随机抽样可能会错过该模型对其预测高度信心但错误的罕见错误。这些高信心错误可能代表昂贵的错误,因此应明确搜索。过去的工作已经开发了搜索技术,以在指定的置信度阈值之上找到分类错误,但忽略了以下事实:在100 \%以下的任何位置都应以置信度的水平预期错误。在这项工作中,我们调查了以模型置信度比预期的率更高的率发现错误的问题。此外,我们提出了一种以对抗性扰动为指导的查询效率和新颖的搜索技术,以在黑匣子模型中找到这些错误。通过严格的经验实验,我们证明了我们的对抗距离搜索以比预期的模型置信度更高的速率发现了高信心误差。
Given a deep neural network image classification model that we treat as a black box, and an unlabeled evaluation dataset, we develop an efficient strategy by which the classifier can be evaluated. Randomly sampling and labeling instances from an unlabeled evaluation dataset allows traditional performance measures like accuracy, precision, and recall to be estimated. However, random sampling may miss rare errors for which the model is highly confident in its prediction, but wrong. These high-confidence errors can represent costly mistakes, and therefore should be explicitly searched for. Past works have developed search techniques to find classification errors above a specified confidence threshold, but ignore the fact that errors should be expected at confidence levels anywhere below 100\%. In this work, we investigate the problem of finding errors at rates greater than expected given model confidence. Additionally, we propose a query-efficient and novel search technique that is guided by adversarial perturbations to find these mistakes in black box models. Through rigorous empirical experimentation, we demonstrate that our Adversarial Distance search discovers high-confidence errors at a rate greater than expected given model confidence.