论文标题
通过前端JavaScript代码生成的任务增强来合并域知识
Incorporating Domain Knowledge through Task Augmentation for Front-End JavaScript Code Generation
论文作者
论文摘要
代码生成旨在从自然语言描述中自动生成代码段。通常,主流代码生成方法依赖大量的配对培训数据,包括自然语言描述和代码。但是,在某些特定领域的方案中,很难为代码生成构建如此大的配对语料库,因为没有直接可用的配对数据,并且需要大量精力来手动编写代码说明来构建高质量的培训数据集。由于培训数据有限,生成模型无法经过良好的训练,并且可能过于拟合,从而使该模型的性能不令人满意。为此,在本文中,我们提出了一种任务增强方法,该方法通过扩展原始的Tranx模型来支持suptoken级代码生成,将域知识通过辅助任务和亚键入TRANX模型纳入代码生成模型。为了验证我们提出的方法,我们收集了一个现实的代码生成数据集并在其上进行实验。我们的实验结果表明,亚句级Tranx模型在我们的数据集上优于原始Tranx模型和变压器模型,并且在我们的任务增强方法的帮助下,Subtoken-Tranx的确切匹配精度可显着提高12.75%。多种代码类别的模型性能满足了工业系统中应用的要求。我们提出的方法已由阿里巴巴的Bizcook平台采用。据我们所知,这是工业开发环境中采用的第一个域代码生成系统。
Code generation aims to generate a code snippet automatically from natural language descriptions. Generally, the mainstream code generation methods rely on a large amount of paired training data, including both the natural language description and the code. However, in some domain-specific scenarios, building such a large paired corpus for code generation is difficult because there is no directly available pairing data, and a lot of effort is required to manually write the code descriptions to construct a high-quality training dataset. Due to the limited training data, the generation model cannot be well trained and is likely to be overfitting, making the model's performance unsatisfactory for real-world use. To this end, in this paper, we propose a task augmentation method that incorporates domain knowledge into code generation models through auxiliary tasks and a Subtoken-TranX model by extending the original TranX model to support subtoken-level code generation. To verify our proposed approach, we collect a real-world code generation dataset and conduct experiments on it. Our experimental results demonstrate that the subtoken-level TranX model outperforms the original TranX model and the Transformer model on our dataset, and the exact match accuracy of Subtoken-TranX improves significantly by 12.75% with the help of our task augmentation method. The model performance on several code categories has satisfied the requirements for application in industrial systems. Our proposed approach has been adopted by Alibaba's BizCook platform. To the best of our knowledge, this is the first domain code generation system adopted in industrial development environments.