全线代码完成的非自动回归模型

论文标题

全线代码完成的非自动回归模型

Non-autoregressive Model for Full-line Code Completion

论文作者

Liu, Fang, Fu, Zhiyi, Li, Ge, Jin, Zhi, Liu, Hui, Hao, Yiyang

论文摘要

软件开发人员经常使用代码完成工具来通过建议以下代码元素来加速软件开发。事实证明，完成一系列代码令牌（例如，完整的代码）比一次预测单个令牌更有效。为了完成代码序列，研究人员正在采用自回归（AR）解码器以从左到右，逐态的方式生成令牌。因此，对下一个令牌的预测取决于所有先前生成的令牌，这导致了推理的较高延迟。为了提高完整代码完成的效率和准确性，在本文中，我们提出了一个非自动回归（NAR）模型，以通过语法意识到的采样策略提高代码完成。我们在两个广泛使用的数据集上的实验结果表明，我们的模型在完整代码完成上优于AR和NAR基准，并且比AR模型的速度更快，速度高达9倍。

Code completion tools are frequently used by software developers to accelerate software development by suggesting the following code elements. Completing a sequence of code tokens (e.g., a full line of code) has been proved more efficient than predicting a single token at a time. To complete the code sequence, researchers are employing AutoRegressive (AR) decoders to generate tokens in a left-to-right, token-by-token fashion. Consequently, the prediction of the next token depends on all previously generated tokens, which leads to high latency in inference. To improve the efficiency and accuracy of full-line code completion, in this paper, we propose a Non-AutoRegressive (NAR) model for code completion boosted by a syntax-aware sampling strategy. Our experimental results on two widely used datasets suggest that our model outperforms both AR and NAR baselines on full-line code completion, and it is faster than the AR model with up to 9 times speed-up.

下载PDF全文

下载文献需遵守相关版权规定

论文标题