论文标题
个性化长尾序列用户行为建模的间隔网络
Personalizing Intervened Network for Long-tailed Sequential User Behavior Modeling
论文作者
论文摘要
在信息爆炸的时代,推荐系统通过促进内容探索在人们的日常生活中起着重要作用。众所周知,用户的活跃性,即行为数量,往往遵循长尾分布,大多数用户的活跃性低。实际上,我们观察到,在联合培训后,尾巴用户的质量建议明显低于首席用户。我们进一步确定,由于数据有限,因此在尾部用户上训练的模型仍然取得了较低的结果。尽管长尾分布在推荐系统中无处不在,但在研究和行业中,提高尾巴用户的推荐性能仍然仍然是挑战。直接应用长尾分配的相关方法可能有可能伤害首席用户的经验,这是不起作用的,因为一小部分具有高积极性的首席用户贡献了平台收入的一部分。在本文中,我们提出了一种新颖的方法,可以显着提高尾巴用户的建议性能,同时至少在基本模型上为首席用户提供了至少可比的性能。这种方法的本质是一种新颖的梯度聚合技术,它将所有用户共享的常见知识学习为骨干模型,然后为Head用户和Tail用户个性化提供单独的插件预测网络。至于常识学习,我们利用因果关系理论的向后调整来消除梯度估计,从而从混杂因素(即用户的活性)中屏蔽了骨干训练。我们对两个公共建议基准数据集和一个从支撑台平台收集的大规模工业数据集进行了广泛的实验。经验研究证实了我们方法的合理性和有效性。
In an era of information explosion, recommendation systems play an important role in people's daily life by facilitating content exploration. It is known that user activeness, i.e., number of behaviors, tends to follow a long-tail distribution, where the majority of users are with low activeness. In practice, we observe that tail users suffer from significantly lower-quality recommendation than the head users after joint training. We further identify that a model trained on tail users separately still achieve inferior results due to limited data. Though long-tail distributions are ubiquitous in recommendation systems, improving the recommendation performance on the tail users still remains challenge in both research and industry. Directly applying related methods on long-tail distribution might be at risk of hurting the experience of head users, which is less affordable since a small portion of head users with high activeness contribute a considerate portion of platform revenue. In this paper, we propose a novel approach that significantly improves the recommendation performance of the tail users while achieving at least comparable performance for the head users over the base model. The essence of this approach is a novel Gradient Aggregation technique that learns common knowledge shared by all users into a backbone model, followed by separate plugin prediction networks for the head users and the tail users personalization. As for common knowledge learning, we leverage the backward adjustment from the causality theory for deconfounding the gradient estimation and thus shielding off the backbone training from the confounder, i.e., user activeness. We conduct extensive experiments on two public recommendation benchmark datasets and a large-scale industrial datasets collected from the Alipay platform. Empirical studies validate the rationality and effectiveness of our approach.