PAL：程序辅助语言模型

论文标题

PAL：程序辅助语言模型

PAL: Program-aided Language Models

论文作者

Gao, Luyu, Madaan, Aman, Zhou, Shuyan, Alon, Uri, Liu, Pengfei, Yang, Yiming, Callan, Jamie, Neubig, Graham

论文摘要

大型语言模型（LLMS）最近在测试时提供了一些示例（“少数拍摄提示”）时，表现出令人印象深刻的执行算术和象征性推理任务的能力。这些成功的大部分都可以归因于诸如“经验链”之类的促进方法，这些方法是通过将其分解为步骤来理解问题描述的方法，并且解决了问题的每个步骤。虽然LLMS擅长于这种逐步的分解，但LLM通常会在逻辑上和算术中置于问题时，即使在问题上进行了纠正，因此纠正了问题，因此纠正了问题。模型（PAL）：使用LLM读取自然语言问题并将程序作为中间推理步骤产生的新方法，但将解决方案的步骤卸载到Python解释的过程中，将自然语言问题分解为可运行的步骤，仍然是LLM的唯一学习任务，而解决了这一份额。在所有这些自然语言推理任务中，大基础和其他基准的符号和算法推理任务，使用LLM生成代码，并使用Python解释器进行推理，从而更准确地导致了更准确的结果在http://reasonwithpal.com/上，我们的代码和数据的绝对15％。

Large language models (LLMs) have recently demonstrated an impressive ability to perform arithmetic and symbolic reasoning tasks, when provided with a few examples at test time ("few-shot prompting"). Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem. While LLMs seem to be adept at this sort of step-by-step decomposition, LLMs often make logical and arithmetic mistakes in the solution part, even when the problem is decomposed correctly. In this paper, we present Program-Aided Language models (PAL): a novel approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter. With PAL, decomposing the natural language problem into runnable steps remains the only learning task for the LLM, while solving is delegated to the interpreter. We demonstrate this synergy between a neural LLM and a symbolic interpreter across 13 mathematical, symbolic, and algorithmic reasoning tasks from BIG-Bench Hard and other benchmarks. In all these natural language reasoning tasks, generating code using an LLM and reasoning using a Python interpreter leads to more accurate results than much larger models. For example, PAL using Codex achieves state-of-the-art few-shot accuracy on the GSM8K benchmark of math word problems, surpassing PaLM-540B which uses chain-of-thought by absolute 15% top-1. Our code and data are publicly available at http://reasonwithpal.com/ .

下载PDF全文

下载文献需遵守相关版权规定

论文标题