论文标题

重新访问神经元的覆盖范围和深神经网络的质量

Revisiting Neuron Coverage Metrics and Quality of Deep Neural Networks

论文作者

Yang, Zhou, Shi, Jieke, Asyrofi, Muhammad Hilmi, Lo, David

论文摘要

深度神经网络(DNN)已被广泛应用于现代生活,包括自动驾驶等关键领域,这对于确保DNN驱动系统的可靠性和鲁棒性至关重要。作为测试常规软件的代码覆盖量指标的一个类比,研究人员提出了神经元覆盖率指标和覆盖范围驱动的方法来生成DNN测试案例。但是,Yan等人。怀疑现有覆盖标准在DNN测试中的有用性。他们表明,在发现缺陷和改善模型鲁棒性方面,覆盖范围驱动的方法比基于梯度的方法的有效性不如基于梯度的方法。 在本文中,我们对Yan等人的工作进行了复制研究。并扩展实验以进行更深入的分析。包括较大的模型和更高分辨率图像的数据集,以检查结果的普遍性。我们还通过更多的测试案例生成技术扩展了实验,并调整了改善模型鲁棒性的过程,以更接近DNN开发的实际生命周期。我们的实验结果证实了Yan等人的结论。覆盖范围驱动的方法不如基于梯度的方法有效。 Yan等。发现使用基于梯度的方法进行重新培训无法修复通过覆盖范围驱动的方法发现的缺陷。他们将其归因于以下事实:两种类型的方法使用不同的扰动策略:基于梯度的方法执行可区分的转换,而覆盖范围驱动的方法可以执行其他非差异性转换。我们检验了几个假设,并进一步表明,即使是覆盖范围驱动的方法仅限于执行可区分的转换,未经层次的训练仍无法通过基于梯度的方法来修复未覆盖的缺陷。因此,应进一步研究针对覆盖范围驱动方法的防御策略。

Deep neural networks (DNN) have been widely applied in modern life, including critical domains like autonomous driving, making it essential to ensure the reliability and robustness of DNN-powered systems. As an analogy to code coverage metrics for testing conventional software, researchers have proposed neuron coverage metrics and coverage-driven methods to generate DNN test cases. However, Yan et al. doubt the usefulness of existing coverage criteria in DNN testing. They show that a coverage-driven method is less effective than a gradient-based method in terms of both uncovering defects and improving model robustness. In this paper, we conduct a replication study of the work by Yan et al. and extend the experiments for deeper analysis. A larger model and a dataset of higher resolution images are included to examine the generalizability of the results. We also extend the experiments with more test case generation techniques and adjust the process of improving model robustness to be closer to the practical life cycle of DNN development. Our experiment results confirm the conclusion from Yan et al. that coverage-driven methods are less effective than gradient-based methods. Yan et al. find that using gradient-based methods to retrain cannot repair defects uncovered by coverage-driven methods. They attribute this to the fact that the two types of methods use different perturbation strategies: gradient-based methods perform differentiable transformations while coverage-driven methods can perform additional non-differentiable transformations. We test several hypotheses and further show that even coverage-driven methods are constrained only to perform differentiable transformations, the uncovered defects still cannot be repaired by adversarial training with gradient-based methods. Thus, defensive strategies for coverage-driven methods should be further studied.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源