不要完成！防止无助的代码完成生产和可持续的神经法规完成系统

论文标题

不要完成！防止无助的代码完成生产和可持续的神经法规完成系统

Don't Complete It! Preventing Unhelpful Code Completion for Productive and Sustainable Neural Code Completion Systems

论文作者

Sun, Zhensu, Du, Xiaoning, Song, Fu, Wang, Shangwen, Ni, Mingze, Li, Li, Lo, David

论文摘要

当前，大型预训练的语言模型已广泛应用于神经代码完成系统中。尽管大型代码模型的表现明显优于其较小的同行，但开发人员不接受GitHub Copilot的大约70％的显示代码完成。经过审查但不接受，他们对开发人员生产力的帮助非常有限，并且可能会加重开发人员的工作量，因为一旦启用了服务，代码完成将自动并在最先进的代码完成系统中自动生成。更糟糕的是，考虑到大型代码模型的高成本，这是计算资源和能源的巨大浪费，这严重违背了AI技术的可持续发展原则。但是，在神经法规完成的研究社区中，从未意识到，更不用说有效地解决了这种废物。因此，迫切需要防止以经济友好的方式进行这种无助的代码完成。为了填补这一重大差距，我们首先调查了无助的代码完成的提示，称为“低返回提示”。我们从经验上确定了低返回提示中的四种可观察到的模式，每个模式都缺乏必要的信息，因此仅通过增强模型的准确性就难以解决。这证明了根据提示本身识别此类低返回提示的可行性。在这一发现的激励下，我们提出了一种早期的拒绝机制，以预言代码完成质量，以拒绝低返回提示。估计接收无助的代码完成的提示将不会发送到模型。此外，我们研究了五种类型的估计器，以证明该机制的可行性。实验结果表明，估计器可以以97.4％的精度拒绝20％的代码完成请求。

Currently, large pre-trained language models are widely applied in neural code completion systems. Though large code models significantly outperform their smaller counterparts, around 70\% of displayed code completions from Github Copilot are not accepted by developers. Being reviewed but not accepted, their help to developer productivity is considerably limited and may conversely aggravate the workload of developers, as the code completions are automatically and actively generated in state-of-the-art code completion systems as developers type out once the service is enabled. Even worse, considering the high cost of the large code models, it is a huge waste of computing resources and energy, which severely goes against the sustainable development principle of AI technologies. However, such waste has never been realized, not to mention effectively addressed, in the research community for neural code completion. Hence, preventing such unhelpful code completions from happening in a cost-friendly way is of urgent need. To fill this significant gap, we first investigate the prompts of unhelpful code completions, called "low-return prompts". We empirically identify four observable patterns in low-return prompts, each lacking necessary information, making it difficult to address through enhancements to the model's accuracy alone. This demonstrates the feasibility of identifying such low-return prompts based on the prompts themselves. Motivated by this finding, we propose an early-rejection mechanism to turn down low-return prompts by foretelling the code completion qualities. The prompts that are estimated to receive unhelpful code completions will not be sent to the model. Furthermore, we investigated five types of estimators to demonstrate the feasibility of the mechanism. The experimental results show that the estimator can reject 20% of code completion requests with a 97.4% Precision.

下载PDF全文

下载文献需遵守相关版权规定

论文标题