论文标题
TSS:鲁棒性认证的特定于转换的平滑
TSS: Transformation-Specific Smoothing for Robustness Certification
论文作者
论文摘要
随着机器学习(ML)系统的普遍性,保护其安全性至关重要。但是,最近已经证明,动机的对手能够通过使用语义转换来扰动测试数据来误导ML系统。尽管存在着丰富的研究,可为ML模型提供可证明的鲁棒性保证,以与$ \ ell_p $ norm限制的对抗性扰动,但针对语义扰动的保证仍然在很大程度上尚未得到充分兴奋。在本文中,我们提供了TSS - 一个统一的框架,用于证明ML鲁棒性针对一般对抗性语义转换。首先,根据每个转换的属性,我们将共同转换分为两类,即可分解(例如高斯模糊)和可差异分解(例如旋转)变换。对于前者,我们提出了特定于转换的随机平滑策略并获得强大的鲁棒性认证。后一类涵盖涉及插值错误的转换,我们提出了一种基于分层抽样的新方法以证明鲁棒性。我们的框架TSS利用了这些认证策略,并结合了一致性增强的培训,以提供严格的鲁棒性认证。我们对十种具有挑战性的语义转换进行了广泛的实验,并表明TSS明显优于最新技术。此外,据我们所知,TSS是在大规模Imagenet数据集上实现非平凡认证鲁棒性的第一种方法。例如,我们的框架在Imagenet上实现了30.4%的与旋转攻击($ \ pm 30^\ Circ $之内)的稳健精度。此外,要考虑更广泛的转换范围,我们表明TSS对适应性攻击和不可预见的图像腐败(例如CIFAR-10-C和Imagenet-C)也是强大的。
As machine learning (ML) systems become pervasive, safeguarding their security is critical. However, recently it has been demonstrated that motivated adversaries are able to mislead ML systems by perturbing test data using semantic transformations. While there exists a rich body of research providing provable robustness guarantees for ML models against $\ell_p$ norm bounded adversarial perturbations, guarantees against semantic perturbations remain largely underexplored. In this paper, we provide TSS -- a unified framework for certifying ML robustness against general adversarial semantic transformations. First, depending on the properties of each transformation, we divide common transformations into two categories, namely resolvable (e.g., Gaussian blur) and differentially resolvable (e.g., rotation) transformations. For the former, we propose transformation-specific randomized smoothing strategies and obtain strong robustness certification. The latter category covers transformations that involve interpolation errors, and we propose a novel approach based on stratified sampling to certify the robustness. Our framework TSS leverages these certification strategies and combines with consistency-enhanced training to provide rigorous certification of robustness. We conduct extensive experiments on over ten types of challenging semantic transformations and show that TSS significantly outperforms the state of the art. Moreover, to the best of our knowledge, TSS is the first approach that achieves nontrivial certified robustness on the large-scale ImageNet dataset. For instance, our framework achieves 30.4% certified robust accuracy against rotation attack (within $\pm 30^\circ$) on ImageNet. Moreover, to consider a broader range of transformations, we show TSS is also robust against adaptive attacks and unforeseen image corruptions such as CIFAR-10-C and ImageNet-C.