评估基于搜索的软件微主体优先级

论文标题

评估基于搜索的软件微主体优先级

Evaluating Search-Based Software Microbenchmark Prioritization

论文作者

Laaber, Christoph, Yue, Tao, Ali, Shaukat

论文摘要

确保软件性能在代码更改之后不会降低。一种解决方案是定期执行软件Microbenchs，这是一种类似于（功能性的）单元测试的性能测试技术，但是由于广泛的运行时间，该技术通常变得不可行。为了应对这一挑战，研究已经调查了回归测试技术，例如测试案例优先级（TCP），该技术重新排序了Microbench Marks Suite内的执行，以更快地检测更大的性能变化。这样的技术要么是为单位测试而设计的，要么在微型计算上执行低于标准，要么需要复杂的性能模型，从而大大降低了其潜在应用。在本文中，我们从经验上评估了基于单目标的微问题优先级技术，以了解它们是否比贪婪的基于覆盖的技术更有效。为此，我们设计了三个搜索目标，即覆盖范围以最大化，覆盖范围重叠以最小化以及历史性能变化检测以最大化。我们发现，搜索算法（SAS）仅具有竞争力，但并不胜过最好的贪婪，基于覆盖的基线。但是，仅利用性能变化历史记录（没有覆盖信息）的简单贪婪技术比最佳的基于覆盖的技术同样有效，同时更有效，同时更有效，并且运行时开销不到1％。这些结果表明，与基于复杂的覆盖范围的技术相比，简单的，非覆盖的技术更适合微型计算。

Ensuring that software performance does not degrade after a code change is paramount. A solution is to regularly execute software microbenchmarks, a performance testing technique similar to (functional) unit tests, which, however, often becomes infeasible due to extensive runtimes. To address that challenge, research has investigated regression testing techniques, such as test case prioritization (TCP), which reorder the execution within a microbenchmark suite to detect larger performance changes sooner. Such techniques are either designed for unit tests and perform sub-par on microbenchmarks or require complex performance models, drastically reducing their potential application. In this paper, we empirically evaluate single- and multi-objective search-based microbenchmark prioritization techniques to understand whether they are more effective and efficient than greedy, coverage-based techniques. For this, we devise three search objectives, i.e., coverage to maximize, coverage overlap to minimize, and historical performance change detection to maximize. We find that search algorithms (SAs) are only competitive with but do not outperform the best greedy, coverage-based baselines. However, a simple greedy technique utilizing solely the performance change history (without coverage information) is equally or more effective than the best coverage-based techniques while being considerably more efficient, with a runtime overhead of less than 1%. These results show that simple, non-coverage-based techniques are a better fit for microbenchmarks than complex coverage-based techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题