基于测试的补丁聚类，用于自动生成的补丁评估

论文标题

基于测试的补丁聚类，用于自动生成的补丁评估

Test-based Patch Clustering for Automatically-Generated Patches Assessment

论文作者

Martinez, Matias, Kechagia, Maria, Perera, Anjana, Petke, Justyna, Sarro, Federica, Aleti, Aldeida

论文摘要

先前的研究表明，自动化程序维修（APR）技术遭受了过度拟合问题的困扰。当运行补丁程序并且测试套件不会显示任何错误时，会发生过度拟合，但是该补丁实际上并未修复基础错误，或者引入了测试套件未覆盖的新缺陷。因此，APR工具生成的补丁需要由人类程序员验证，这可能是非常昂贵的，并且可以防止APR工具在实践中采用。我们的工作旨在最大程度地减少程序员必须审查的合理补丁的数量，从而减少找到正确的补丁所需的时间。我们介绍了一种新型的基于轻量测试的贴片聚类方法，称为XtestCluster，该方法基于其动态行为将斑块插入。在补丁生成阶段之后，应用XtestCluster，以分析一种或多种修复工具中生成的补丁，并提供有关这些补丁的更多信息，以促进补丁评估。 XtestCluster的新颖性在于使用来自新生成的测试用例的执行信息到由多种APR方法生成的聚类补丁。集群由在相同生成的测试用例上失败的补丁组成。 XtestCluster的输出为开发人员提供了a）减少要分析的补丁的方法，因为它们可以专注于分析每个集群中的补丁样本，b）附加到每个补丁的其他信息。在分析了21 Java APR工具中的902个合理的补丁后，我们的结果表明，XtestCluster能够减少以50％的中位数进行审查和分析的补丁的数量。 XtestCluster可以为开发人员节省大量时间，这些开发人员必须查看APR工具生成的多种补丁，并为它们提供新的测试用例，以暴露出生成的补丁之间的行为差异。

Previous studies have shown that Automated Program Repair (APR) techniques suffer from the overfitting problem. Overfitting happens when a patch is run and the test suite does not reveal any error, but the patch actually does not fix the underlying bug or it introduces a new defect that is not covered by the test suite. Therefore, the patches generated by apr tools need to be validated by human programmers, which can be very costly, and prevents apr tool adoption in practice. Our work aims to minimize the number of plausible patches that programmers have to review, thereby reducing the time required to find a correct patch. We introduce a novel light-weight test-based patch clustering approach called xTestCluster, which clusters patches based on their dynamic behavior. xTestCluster is applied after the patch generation phase in order to analyze the generated patches from one or more repair tools and to provide more information about those patches for facilitating patch assessment. The novelty of xTestCluster lies in using information from execution of newly generated test cases to cluster patches generated by multiple APR approaches. A cluster is formed of patches that fail on the same generated test cases. The output from xTestCluster gives developers a) a way of reducing the number of patches to analyze, as they can focus on analyzing a sample of patches from each cluster, b) additional information attached to each patch. After analyzing 902 plausible patches from 21 Java APR tools, our results show that xTestCluster is able to reduce the number of patches to review and analyze with a median of 50%. xTestCluster can save a significant amount of time for developers that have to review the multitude of patches generated by apr tools, and provides them with new test cases that expose the differences in behavior between generated patches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题