以API为中心代码的预测合成

论文标题

以API为中心代码的预测合成

Predictive Synthesis of API-Centric Code

论文作者

Nam, Daye, Ray, Baishakhi, Kim, Seohyun, Qu, Xianshan, Chandra, Satish

论文摘要

当今的程序员，尤其是数据科学从业人员，大量使用数据处理库（API），例如Pytorch，Tensorflow，Numpy，Pandas等。程序合成器可以为这个用户社区提供大量的编码帮助；但是，由于巨大的搜索空间，程序合成也可能很慢。在这项工作中，我们研究了可以使用机器学习来加速枚举程序合成的方法。我们提出了一个基于深度学习的模型，以预测从给定输入到所需输出所需的API函数序列，这都是数字向量。我们的工作基于两个见解。首先，可以根据大量输入输出示例来学习，以预测给定情况下可能需要的API函数。其次，至关重要的是，如果输入和所需的最终输出，也可以学会将API函数构成序列，而不明确知道中间值。我们证明，我们可以使用模型变体的预测加快枚举程序合成器的速度。这些加速度显着优于以前的方式（例如，深编码器），其中研究人员在列举合成中使用了ML模型。

Today's programmers, especially data science practitioners, make heavy use of data-processing libraries (APIs) such as PyTorch, Tensorflow, NumPy, Pandas, and the like. Program synthesizers can provide significant coding assistance to this community of users; however program synthesis also can be slow due to enormous search spaces. In this work, we examine ways in which machine learning can be used to accelerate enumerative program synthesis. We present a deep-learning-based model to predict the sequence of API functions that would be needed to go from a given input to a desired output, both being numeric vectors. Our work is based on two insights. First, it is possible to learn, based on a large number of input-output examples, to predict the likely API function needed in a given situation. Second, and crucially, it is also possible to learn to compose API functions into a sequence, given an input and the desired final output, without explicitly knowing the intermediate values. We show that we can speed up an enumerative program synthesizer by using predictions from our model variants. These speedups significantly outperform previous ways (e.g. DeepCoder) in which researchers have used ML models in enumerative synthesis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题