在人工智能系统中的对抗例子和隐形攻击上

论文标题

在人工智能系统中的对抗例子和隐形攻击上

On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems

论文作者

Tyukin, Ivan Y., Higham, Desmond J., Gorban, Alexander N.

论文摘要

在这项工作中，我们提出了一个正式的理论框架，用于评估和分析针对通用人工智能（AI）系统的两类恶毒行动。我们的结果适用于从输入空间映射到决策空间的一般多级分类器，包括深度学习应用中使用的人工神经网络。考虑了两类攻击。头等舱涉及对抗示例，并涉及引起导致错误分类的输入数据的小扰动。第二类是在这里首次介绍，并命名为隐形攻击，涉及对AI系统本身的小扰动。在这里，扰动系统会在特定的小数据集（甚至是单个输入）上产生攻击者所需的任何输出，但在验证集（攻击者未知）上按正常执行。我们表明，在这两种情况下，即在基于对抗性示例的攻击中，在隐身攻击的情况下，AI决策空间的维度是AI易感性的主要贡献者。对于基于对抗性示例的攻击，第二个关键参数是数据概率分布中没有局部浓度，该属性被称为涂抹绝对连续性。根据我们的发现，对对抗性示例的鲁棒性要求（a）AI特征空间中的数据分布具有集中的概率密度函数，或者（b）AI决策变量的维度足够小。我们还展示了如何在高维AI系统上构建隐形攻击，除非验证集成倍地大，否则很难发现。

In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification. The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself. Here the perturbed system produces whatever output is desired by the attacker on a specific small data set, perhaps even a single input, but performs as normal on a validation set (which is unknown to the attacker). We show that in both cases, i.e., in the case of an attack based on adversarial examples and in the case of a stealth attack, the dimensionality of the AI's decision-making space is a major contributor to the AI's susceptibility. For attacks based on adversarial examples, a second crucial parameter is the absence of local concentrations in the data probability distribution, a property known as Smeared Absolute Continuity. According to our findings, robustness to adversarial examples requires either (a) the data distributions in the AI's feature space to have concentrated probability density functions or (b) the dimensionality of the AI's decision variables to be sufficiently small. We also show how to construct stealth attacks on high-dimensional AI systems that are hard to spot unless the validation set is made exponentially large.

下载PDF全文

下载文献需遵守相关版权规定

论文标题