论文标题
与顺序奖励互动的板岩建议的反事实评估
Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions
论文作者
论文摘要
音乐流媒体,视频流,新闻推荐和电子商务服务的用户通常会以连续的方式与内容互动。因此,提供和评估良好的建议序列是这些服务的核心问题。先前基于重新获得的反事实评估方法要么具有较高的差异或对奖励做出强有力的独立假设。我们提出了一个新的反事实估计器,该估计量允许以渐近无偏见的方式在奖励中以较低的差异进行顺序相互作用。我们的方法使用有关板岩因果关系的图形假设,以近似于目标策略下的预期奖励总和的方式来重新持续记录策略中的奖励。在模拟和实时推荐系统中进行的广泛实验表明,我们的方法在顺序轨道建议问题的偏差和数据效率方面优于现有方法。
Users of music streaming, video streaming, news recommendation, and e-commerce services often engage with content in a sequential manner. Providing and evaluating good sequences of recommendations is therefore a central problem for these services. Prior reweighting-based counterfactual evaluation methods either suffer from high variance or make strong independence assumptions about rewards. We propose a new counterfactual estimator that allows for sequential interactions in the rewards with lower variance in an asymptotically unbiased manner. Our method uses graphical assumptions about the causal relationships of the slate to reweight the rewards in the logging policy in a way that approximates the expected sum of rewards under the target policy. Extensive experiments in simulation and on a live recommender system show that our approach outperforms existing methods in terms of bias and data efficiency for the sequential track recommendations problem.