论文标题
信息理论用户互动:计划合成的重要输入
Information-theoretic User Interaction: Significant Inputs for Program Synthesis
论文作者
论文摘要
逐个示例技术正在部署在工业产品中,以实时综合各种数据转换。这些技术依靠用户提供转换任务的代表性示例。在本文中,我们需要找到最相关的问题,我们介绍了{\ em重要的问题问题},并表明它一般很难。然后,我们开发一种信息理论贪婪方法来解决问题。我们使用条件性熵结果证明了贪婪算法是合理的,该结果非正式地说,达到最大信息增益的问题是我们最了解的问题。 在交互式程序综合的上下文中,我们使用上述结果来开发{\ em {Active Program Learner}},该{\ em {Active program Learner}}生成了重要的输入,以构成每次迭代中用户的查询。该过程要求将一个{\ em {无源程序学习者}}扩展到{\ em {采样程序learner}}},该程序能够从所有一致程序的集合中采样候选程序,以估算信息增益。它还基于输入中的功能和相应的输出使用输入的聚类来采样一组候选重要输入。我们的活跃学习者能够权衡假阳性的错误负面因素,并在800个字符串转换任务的实际数据集中进行少量迭代。
Programming-by-example technologies are being deployed in industrial products for real-time synthesis of various kinds of data transformations. These technologies rely on the user to provide few representative examples of the transformation task. Motivated by the need to find the most pertinent question to ask the user, in this paper, we introduce the {\em significant questions problem}, and show that it is hard in general. We then develop an information-theoretic greedy approach for solving the problem. We justify the greedy algorithm using the conditional entropy result, which informally says that the question that achieves the maximum information gain is the one that we know least about. In the context of interactive program synthesis, we use the above result to develop an {\em{active program learner}} that generates the significant inputs to pose as queries to the user in each iteration. The procedure requires extending a {\em{passive program learner}} to a {\em{sampling program learner}} that is able to sample candidate programs from the set of all consistent programs to enable estimation of information gain. It also uses clustering of inputs based on features in the inputs and the corresponding outputs to sample a small set of candidate significant inputs. Our active learner is able to tradeoff false negatives for false positives and converge in a small number of iterations on a real-world dataset of %around 800 string transformation tasks.