论文标题
具有未观察到的混杂因素的风险规避多军匪徒:移动健康中情绪调节的案例研究
Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health
论文作者
论文摘要
在本文中,我们考虑了一种规避风险的多军强盗(MAB)问题,其中的目标是学习一项最小化预期收益率低的风险的政策,而不是最大化预期的回报本身,这是风险中立mab的通常方法的目标。具体而言,我们将这个问题作为专家和学习者在仅由专家观察到但不能被学习者观察到的情况下的专家和学习者之间的转移学习问题提出。因此,从学习者的角度来看,这种情况是未观察到的混杂因素(UC)。鉴于专家生成的数据集不包括UCS,学习者的目标是确定具有更少在线学习步骤的真正最小风险臂,同时避免由于专家数据中存在UCS而导致的可能有偏见的决策。
In this paper, we consider a risk-averse multi-armed bandit (MAB) problem where the goal is to learn a policy that minimizes the risk of low expected return, as opposed to maximizing the expected return itself, which is the objective in the usual approach to risk-neutral MAB. Specifically, we formulate this problem as a transfer learning problem between an expert and a learner agent in the presence of contexts that are only observable by the expert but not by the learner. Thus, such contexts are unobserved confounders (UCs) from the learner's perspective. Given a dataset generated by the expert that excludes the UCs, the goal for the learner is to identify the true minimum-risk arm with fewer online learning steps, while avoiding possible biased decisions due to the presence of UCs in the expert's data.