论文标题
动态搜索广告的极端回归
Extreme Regression for Dynamic Search Advertising
论文作者
论文摘要
本文介绍了一种称为“极限回归(XR)”的新学习范式,其目的是准确预测大量标签与数据点相关性的数值程度。 XR可以为许多大规模排名和推荐应用程序提供优雅的解决方案,包括动态搜索广告(DSA)。 XR比最近流行的极端分类器可以学习更准确的模型,这些模型错误地假设了严格的二进制标签相关性。传统的回归指标总计所有标签上的错误不适合XR问题,因为它们可以为标签排名质量提供极度松散的界限。同样,现有的回归算法不会有效地扩展到数百万个标签。本文通过以下方式解决了这些局限性:(1)XR的新评估指标,该指标仅列为最大的回归错误; (2)一种称为XREG的新算法,该算法将XR任务分解为较小的回归问题的层次结构,从而导致高效的训练和预测。本文还介绍了XREG中的新标签预测算法,可用于DSA和其他推荐任务。基准数据集上的实验表明,XREG可以胜过最先进的极端分类器以及大规模回归器和排名,而新的XR误差指标可减少50%,而在倾向分数的精确度量方面,高达2%和2.4%的改进,用于极端分类和点击率使用,并在DSA中分别使用。 XREG在BING中的部署在查询覆盖范围内相对增益为27%。 XREG的源代码可以从http://manikvarma.org/code/xreg/download.html下载。
This paper introduces a new learning paradigm called eXtreme Regression (XR) whose objective is to accurately predict the numerical degrees of relevance of an extremely large number of labels to a data point. XR can provide elegant solutions to many large-scale ranking and recommendation applications including Dynamic Search Advertising (DSA). XR can learn more accurate models than the recently popular extreme classifiers which incorrectly assume strictly binary-valued label relevances. Traditional regression metrics which sum the errors over all the labels are unsuitable for XR problems since they could give extremely loose bounds for the label ranking quality. Also, the existing regression algorithms won't efficiently scale to millions of labels. This paper addresses these limitations through: (1) new evaluation metrics for XR which sum only the k largest regression errors; (2) a new algorithm called XReg which decomposes XR task into a hierarchy of much smaller regression problems thus leading to highly efficient training and prediction. This paper also introduces a (3) new labelwise prediction algorithm in XReg useful for DSA and other recommendation tasks. Experiments on benchmark datasets demonstrated that XReg can outperform the state-of-the-art extreme classifiers as well as large-scale regressors and rankers by up to 50% reduction in the new XR error metric, and up to 2% and 2.4% improvements in terms of the propensity-scored precision metric used in extreme classification and the click-through rate metric used in DSA respectively. Deployment of XReg on DSA in Bing resulted in a relative gain of 27% in query coverage. XReg's source code can be downloaded from http://manikvarma.org/code/XReg/download.html.