论文标题
如何避免被刺激吞噬:文本世界的结构化探索策略
How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds
论文作者
论文摘要
基于文本的游戏是长期的难题或任务,其特征是一系列稀疏且潜在的欺骗性奖励。它们提供了一个理想的平台,以开发使用组合大小的自然语言状态行动空间来开发对世界的代理商。标准的加固学习剂具有很差的能力,可以有效地探索此类空间,并且常常难以克服瓶颈 - 并指出,由于他们看不到正确的动作序列而无法充分加强,因此无法仅仅因为他们看不到正确的动作序列而无法通过。我们介绍了Q*Bert,该代理商通过回答问题来构建世界知识图,从而提高样本效率。为了克服瓶颈,我们进一步介绍了MC!我们进行了一项消融研究,结果表明我们的方法在九个文本游戏中的当前最新作品(包括受欢迎的游戏,Zork)的表现如何,这是一个学习代理人首次超越瓶颈,在该瓶颈中,玩家被刺痛吞噬了。
Text-based games are long puzzles or quests, characterized by a sequence of sparse and potentially deceptive rewards. They provide an ideal platform to develop agents that perceive and act upon the world using a combinatorially sized natural language state-action space. Standard Reinforcement Learning agents are poorly equipped to effectively explore such spaces and often struggle to overcome bottlenecks---states that agents are unable to pass through simply because they do not see the right action sequence enough times to be sufficiently reinforced. We introduce Q*BERT, an agent that learns to build a knowledge graph of the world by answering questions, which leads to greater sample efficiency. To overcome bottlenecks, we further introduce MC!Q*BERT an agent that uses an knowledge-graph-based intrinsic motivation to detect bottlenecks and a novel exploration strategy to efficiently learn a chain of policy modules to overcome them. We present an ablation study and results demonstrating how our method outperforms the current state-of-the-art on nine text games, including the popular game, Zork, where, for the first time, a learning agent gets past the bottleneck where the player is eaten by a Grue.