圈子：跨编程语言的持续维修

论文标题

圈子：跨编程语言的持续维修

CIRCLE: Continual Repair across Programming Languages

论文作者

Yuan, Wei, Zhang, Quanjun, He, Tieke, Fang, Chunrong, Hung, Nguyen Quoc Viet, Hao, Xiaodong, Yin, Hongzhi

论文摘要

自动程序维修（APR）旨在通过减少手动调试工作来修复车厢源代码，这在提高软件可靠性和开发生产率方面起着至关重要的作用。最近的APR工作通过应用深度学习（DL），尤其是神经机器翻译（NMT）技术取得了显着的进步。但是，我们观察到现有的基于DL的APR模型至少遭受了两个严重的缺点：（1）结果，其中大多数只能为单个编程语言生成补丁，从而修复多种语言，我们必须构建和培训许多维修模型。（2）它们中的大多数是以离线方式开发的。因此，当有新的要求时，它们将无法正常运行。为了解决上述问题，提出了一个基于T5的APR框架，具有跨多种编程语言的连续学习能力，即\ emph {c} ont \ emph {i} nual \ emph {r} epair a \ emph a \ emph {c} ross ross编程具体而言，（1）Circle利用提示功能来缩小自然语言处理（NLP）预训练的任务和APR之间的差距。（2）Circle采用基于困难的彩排策略，以实现APR的终身学习，而无需访问完整的历史数据。（3）采用一种弹性正则化方法来进一步增强Circle的持续学习能力，以防止其灾难性遗忘。（4）Circle应用一种简单但有效的重新修复方法来修改跨越多种编程语言引起的产生的错误。我们训练四种语言（即C，Java，JavaScript和Python），并在五个常用的基准测试中对其进行评估。实验结果表明，圆圈不仅有效，有效地修复了连续学习设置中的多种编程语言，而且还通过单个维修模型来实现最先进的性能。

Automatic Program Repair (APR) aims at fixing buggy source code with less manual debugging efforts, which plays a vital role in improving software reliability and development productivity. Recent APR works have achieved remarkable progress via applying deep learning (DL), particularly neural machine translation (NMT) techniques. However, we observe that existing DL-based APR models suffer from at least two severe drawbacks: (1) Most of them can only generate patches for a single programming language, as a result, to repair multiple languages, we have to build and train many repairing models. (2) Most of them are developed in an offline manner. Therefore, they won't function when there are new-coming requirements. To address the above problems, a T5-based APR framework equipped with continual learning ability across multiple programming languages is proposed, namely \emph{C}ont\emph{I}nual \emph{R}epair a\emph{C}ross Programming \emph{L}anguag\emph{E}s (\emph{CIRCLE}). Specifically, (1) CIRCLE utilizes a prompting function to narrow the gap between natural language processing (NLP) pre-trained tasks and APR. (2) CIRCLE adopts a difficulty-based rehearsal strategy to achieve lifelong learning for APR without access to the full historical data. (3) An elastic regularization method is employed to strengthen CIRCLE's continual learning ability further, preventing it from catastrophic forgetting. (4) CIRCLE applies a simple but effective re-repairing method to revise generated errors caused by crossing multiple programming languages. We train CIRCLE for four languages (i.e., C, JAVA, JavaScript, and Python) and evaluate it on five commonly used benchmarks. The experimental results demonstrate that CIRCLE not only effectively and efficiently repairs multiple programming languages in continual learning settings, but also achieves state-of-the-art performance with a single repair model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题