论文标题
带有抽样上下文的贪婪土匪
Greedy Bandits with Sampled Context
论文作者
论文摘要
贝叶斯对上下文匪徒的策略已通过使用环境中的上下文信息对不确定性进行建模,这在单状态增强学习任务中证明了有希望。在本文中,我们提出了带有采样上下文(GB-SC)的贪婪土匪,这是一种使用汤普森采样的上下文信息从上下文信息中开发先验的方法,并使用epsilon-greedy策略进行了手臂选择。该框架GB-SC允许评估上下文奖励依赖性,并通过利用先验的开发方式来为部分可观察到的上下文向量提供鲁棒性。我们的实验结果以预期的遗憾和预期的累积遗憾以及对每个上下文子集的影响如何影响决策的见解,在蘑菇环境中表现出竞争性表现。
Bayesian strategies for contextual bandits have proved promising in single-state reinforcement learning tasks by modeling uncertainty using context information from the environment. In this paper, we propose Greedy Bandits with Sampled Context (GB-SC), a method for contextual multi-armed bandits to develop the prior from the context information using Thompson Sampling, and arm selection using an epsilon-greedy policy. The framework GB-SC allows for evaluation of context-reward dependency, as well as providing robustness for partially observable context vectors by leveraging the prior developed. Our experimental results show competitive performance on the Mushroom environment in terms of expected regret and expected cumulative regret, as well as insights on how each context subset affects decision-making.