代码作为策略：用于体现控制的语言模型程序

论文标题

代码作为策略：用于体现控制的语言模型程序

Code as Policies: Language Model Programs for Embodied Control

论文作者

Liang, Jacky, Huang, Wenlong, Xia, Fei, Xu, Peng, Hausman, Karol, Ichter, Brian, Florence, Pete, Zeng, Andy

论文摘要

在代码完成培训的大型语言模型（LLM）已被证明能够从DocStrings [1]合成简单的Python程序。我们发现这些代码编写的LLM可以被重新使用以编写自然语言命令，以编写机器人策略代码。具体而言，策略代码可以表达处理感知输出的功能或反馈循环（例如，从对象检测器[2]，[3]）并参数化控制原始API。当作为输入提供几个示例语言命令（格式为注释）后，然后是相应的策略代码（通过少数弹出提示），LLMS可以接收新命令并自主重新构造API调用以生成新的策略代码。通过链接经典的逻辑结构并参考第三方库（例如，numpy，巧妙）执行算术，以这种方式使用的LLM可以编写（i）（i）（i）表现出空间几何推理的机器人策略，（ii）在新的指示中概括（ii）在新的说明上，以及（iii）规定（iii）的（例如，（例如，velasive anf），范围的范围（例如，velicities）” （即行为常识）。本文将代码作为策略介绍：一种以机器人为中心的语言模型生成程序（LMP）的公式，该计划可以代表反应性策略（例如阻抗控制器），以及基于Waypoint的策略（基于远见的选择，基于轨迹的控制），在多个真实的机器人平台上证明了这一点。我们方法的核心是促使层次代码 - 基因（递归定义未定义的功能），该代码可以编写更复杂的代码，还可以改进最新的代码，以解决HOMANEVAL [1]基准的39.8％的问题。代码和视频可从https://code-as-policies.github.io获得。

Large language models (LLMs) trained on code completion have been shown to be capable of synthesizing simple Python programs from docstrings [1]. We find that these code-writing LLMs can be re-purposed to write robot policy code, given natural language commands. Specifically, policy code can express functions or feedback loops that process perception outputs (e.g.,from object detectors [2], [3]) and parameterize control primitive APIs. When provided as input several example language commands (formatted as comments) followed by corresponding policy code (via few-shot prompting), LLMs can take in new commands and autonomously re-compose API calls to generate new policy code respectively. By chaining classic logic structures and referencing third-party libraries (e.g., NumPy, Shapely) to perform arithmetic, LLMs used in this way can write robot policies that (i) exhibit spatial-geometric reasoning, (ii) generalize to new instructions, and (iii) prescribe precise values (e.g., velocities) to ambiguous descriptions ("faster") depending on context (i.e., behavioral commonsense). This paper presents code as policies: a robot-centric formulation of language model generated programs (LMPs) that can represent reactive policies (e.g., impedance controllers), as well as waypoint-based policies (vision-based pick and place, trajectory-based control), demonstrated across multiple real robot platforms. Central to our approach is prompting hierarchical code-gen (recursively defining undefined functions), which can write more complex code and also improves state-of-the-art to solve 39.8% of problems on the HumanEval [1] benchmark. Code and videos are available at https://code-as-policies.github.io

下载PDF全文

下载文献需遵守相关版权规定

论文标题