论文标题
用于神经网络编码器的分布外攻击的算法
An Algorithm for Out-Of-Distribution Attack to Neural Network Encoder
论文作者
论文摘要
深度神经网络(DNN),尤其是卷积神经网络,在图像分类任务上取得了卓越的性能。但是,只有在训练模型的输入类似于训练样本的情况下,即输入遵循训练集的概率分布,才可以保证这种性能。分布式(OOD)样品不遵循训练集的分布,因此OOD样品上的类别标签变得毫无意义。已经提出了基于分类的方法进行OOD检测。但是,在这项研究中,我们表明,由于DNN模型的维度降低,因此我们的OOD攻击算法没有理论保证,并且实际上可以通过我们的OOD攻击算法破裂。我们还表明,基于发光可能性的OOD检测也是可损坏的。
Deep neural networks (DNNs), especially convolutional neural networks, have achieved superior performance on image classification tasks. However, such performance is only guaranteed if the input to a trained model is similar to the training samples, i.e., the input follows the probability distribution of the training set. Out-Of-Distribution (OOD) samples do not follow the distribution of training set, and therefore the predicted class labels on OOD samples become meaningless. Classification-based methods have been proposed for OOD detection; however, in this study we show that this type of method has no theoretical guarantee and is practically breakable by our OOD Attack algorithm because of dimensionality reduction in the DNN models. We also show that Glow likelihood-based OOD detection is breakable as well.