推荐系统的自我监督的强化学习

论文标题

Self-Supervised Reinforcement Learning for Recommender Systems

论文作者

Xin, Xin, Karatzoglou, Alexandros, Arapakis, Ioannis, Jose, Joemon M.

论文摘要

在基于会话或顺序的建议中，重要的是要考虑多个因素，例如长期用户参与，多种类型的用户互动，例如点击，购买等。当前的最新监督方法无法适当地对其进行建模。将顺序推荐任务作为加强学习（RL）问题是一个有希望的方向。 RL方法的一个主要组成部分是通过与环境的互动来训练代理。但是，由于要求将用户暴露于无关紧要的建议中，因此通常以在线方式培训推荐人通常是有问题的。结果，从记录的隐式反馈中学习政策至关重要，由于纯粹的非政策环境和缺乏负面奖励（反馈），这具有挑战性。在本文中，我们提出了为顺序推荐任务的自我监督的强化学习。我们的方法增加了具有两个输出层的标准建议模型：一个用于自我监督学习，另一个用于RL。 RL零件充当正规机，以驱动监督层的重点是特定奖励（例如，推荐可能导致购买而不是点击的项目），而具有交叉凝聚损失的自我监督层为参数更新提供了强大的梯度信号。基于这种方法，我们提出了两个框架，即自我监督的Q学习（SQN）和自我监督的演员 - 批评（SAC）。我们将提出的框架与四个最先进的推荐模型集成在一起。两个现实世界数据集的实验结果证明了我们方法的有效性。

In session-based or sequential recommendation, it is important to consider a number of factors like long-term user engagement, multiple types of user-item interactions such as clicks, purchases etc. The current state-of-the-art supervised approaches fail to model them appropriately. Casting sequential recommendation task as a reinforcement learning (RL) problem is a promising direction. A major component of RL approaches is to train the agent through interactions with the environment. However, it is often problematic to train a recommender in an on-line fashion due to the requirement to expose users to irrelevant recommendations. As a result, learning the policy from logged implicit feedback is of vital importance, which is challenging due to the pure off-policy setting and lack of negative rewards (feedback). In this paper, we propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. The RL part acts as a regularizer to drive the supervised layer focusing on specific rewards(e.g., recommending items which may lead to purchases rather than clicks) while the self-supervised layer with cross-entropy loss provides strong gradient signals for parameter updates. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC). We integrate the proposed frameworks with four state-of-the-art recommendation models. Experimental results on two real-world datasets demonstrate the effectiveness of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题