论文标题
在交互式信息检索中平衡强化学习培训经验
Balancing Reinforcement Learning Training Experiences in Interactive Information Retrieval
论文作者
论文摘要
交互式信息检索(IIR)和增强学习(RL)共享许多共同点,包括在互动时学习的代理,长期和复杂的目标以及探索和适应的算法。为了成功地将RL方法应用于IIR,一个挑战是获得足够的相关标签来训练RL剂,而RL剂却臭名昭著地称为样品效率低下。但是,在给定查询的文本语料库中,这不是相关的文件,而是无关的文档。这将为代理商带来非常不平衡的培训经验,并阻止其学习任何有效的政策。我们的论文通过使用域随机化来解决培训更相关的文档来解决此问题。我们在文本检索会议(TREC)动态域(DD)2017曲目上的实验结果表明,该方法能够将RL代理的学习有效性提高22 \%,以应对看不见的情况。
Interactive Information Retrieval (IIR) and Reinforcement Learning (RL) share many commonalities, including an agent who learns while interacts, a long-term and complex goal, and an algorithm that explores and adapts. To successfully apply RL methods to IIR, one challenge is to obtain sufficient relevance labels to train the RL agents, which are infamously known as sample inefficient. However, in a text corpus annotated for a given query, it is not the relevant documents but the irrelevant documents that predominate. This would cause very unbalanced training experiences for the agent and prevent it from learning any policy that is effective. Our paper addresses this issue by using domain randomization to synthesize more relevant documents for the training. Our experimental results on the Text REtrieval Conference (TREC) Dynamic Domain (DD) 2017 Track show that the proposed method is able to boost an RL agent's learning effectiveness by 22\% in dealing with unseen situations.