论文标题

我们从这里去哪里?离线建议评估指南

Where Do We Go From Here? Guidelines For Offline Recommender Evaluation

论文作者

Schnabel, Tobias

论文摘要

近年来,各种研究指出了推荐系统的离线评估中的大问题,因此很难评估是否已经取得了真正的进步。但是,几乎没有研究哪种实践应作为实验过程中的起点。在本文中,我们在建议系统研究中研究了有关不确定性估计,概括,超参数优化和数据集预处理的四个较大问题,以得出一组准则。我们提出了一个TrainRec,这是一种轻巧且灵活的工具包,用于离线培训和评估实现这些准则的推荐系统。与其他框架不同,TrainRec是一种专注于实验的工具包,提供可以一起或隔离使用的灵活模块。 最后,我们通过评估十个数据集中的十二个基线来证明TrainRec的实用性。我们的结果表明,(i)较小数据集上的许多结果可能在统计学上不显着,(ii)至少有三个基线在大多数数据集中表现良好,并且在将来的实验中应考虑,并且(iii)改善了线性和神经方法之间的一些报告的不确定性量化(通过嵌套的CV和统计测试)规则。鉴于这些结果,我们主张未来的研究应使用我们建议的准则标准化评估。

Various studies in recent years have pointed out large issues in the offline evaluation of recommender systems, making it difficult to assess whether true progress has been made. However, there has been little research into what set of practices should serve as a starting point during experimentation. In this paper, we examine four larger issues in recommender system research regarding uncertainty estimation, generalization, hyperparameter optimization and dataset pre-processing in more detail to arrive at a set of guidelines. We present a TrainRec, a lightweight and flexible toolkit for offline training and evaluation of recommender systems that implements these guidelines. Different from other frameworks, TrainRec is a toolkit that focuses on experimentation alone, offering flexible modules that can be can be used together or in isolation. Finally, we demonstrate TrainRec's usefulness by evaluating a diverse set of twelve baselines across ten datasets. Our results show that (i) many results on smaller datasets are likely not statistically significant, (ii) there are at least three baselines that perform well on most datasets and should be considered in future experiments, and (iii) improved uncertainty quantification (via nested CV and statistical testing) rules out some reported differences between linear and neural methods. Given these results, we advocate that future research should standardize evaluation using our suggested guidelines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源