重要性驱动的深度学习系统测试

论文标题

重要性驱动的深度学习系统测试

Importance-Driven Deep Learning System Testing

论文作者

Gerasimou, Simos, Eniser, Hasan Ferit, Sen, Alper, Cakan, Alper

论文摘要

深度学习（DL）系统是工程智能应用程序的关键推动力，因为它们能够解决复杂的任务，例如图像识别和机器翻译。但是，在安全和关键性应用程序中使用DL系统需要为其可靠的操作提供测试证据。在这个方向上的最新研究重点是调整传统软件工程的测试标准，以增加对其正确行为的信心。但是，它们在捕获这些系统所表现出的固有特性方面不足。我们通过引入Deepimportance来弥合这一差距，这是一种系统的测试方法，并伴随着对DL系统的重要性驱动（IDC）测试充分性标准。应用IDC可以建立对DL系统组件重要性的层次功能理解，并使用此信息来评估测试集的语义多样性。我们对多个DL系统，跨多个DL数据集以及最先进的对抗生成技术的经验评估证明了Deepimportance的有用性和有效性及其支持更强大DL系统的工程的能力。

Deep Learning (DL) systems are key enablers for engineering intelligent applications due to their ability to solve complex tasks such as image recognition and machine translation. Nevertheless, using DL systems in safety- and security-critical applications requires to provide testing evidence for their dependable operation. Recent research in this direction focuses on adapting testing criteria from traditional software engineering as a means of increasing confidence for their correct behaviour. However, they are inadequate in capturing the intrinsic properties exhibited by these systems. We bridge this gap by introducing DeepImportance, a systematic testing methodology accompanied by an Importance-Driven (IDC) test adequacy criterion for DL systems. Applying IDC enables to establish a layer-wise functional understanding of the importance of DL system components and use this information to assess the semantic diversity of a test set. Our empirical evaluation on several DL systems, across multiple DL datasets and with state-of-the-art adversarial generation techniques demonstrates the usefulness and effectiveness of DeepImportance and its ability to support the engineering of more robust DL systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题