Coditt5：源代码和自然语言编辑的训练

论文标题

Coditt5：源代码和自然语言编辑的训练

CoditT5: Pretraining for Source Code and Natural Language Editing

论文作者

Zhang, Jiyang, Panthaplackel, Sheena, Nie, Pengyu, Li, Junyi Jessy, Gligoric, Milos

论文摘要

预审前的语言模型已被证明在许多与软件有关的一代任务中有效。但是，它们不适合编辑任务，因为它们不是为了推理编辑的设计。为了解决这个问题，我们提出了一个新颖的预处理目标，该目标明确地对编辑进行了建模，并使用它来构建Coditt5，这是一种用于软件相关的编辑任务的大型语言模型，该任务是在大量源代码和自然语言评论中仔细预测的。我们在各种下游编辑任务上进行微调，包括评论更新，错误修复和自动化代码审核。通过超过标准生成的模型，我们证明了方法的普遍性及其对编辑任务的适用性。我们还展示了标准生成模型和我们的基于编辑的模型如何通过简单的重读策略相互补充，我们可以通过该策略实现三个下游编辑任务的最新性能。

Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel pretraining objective which explicitly models edits and use it to build CoditT5, a large language model for software-related editing tasks that is pretrained on large amounts of source code and natural language comments. We fine-tune it on various downstream editing tasks, including comment updating, bug fixing, and automated code review. By outperforming standard generation-based models, we demonstrate the generalizability of our approach and its suitability for editing tasks. We also show how a standard generation model and our edit-based model can complement one another through simple reranking strategies, with which we achieve state-of-the-art performance for the three downstream editing tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题