论文标题

Pecos:巨大和相关的输出空间的预测

PECOS: Prediction for Enormous and Correlated Output Spaces

论文作者

Yu, Hsiang-Fu, Zhong, Kai, Zhang, Jiong, Chang, Wei-Cheng, Dhillon, Inderjit S.

论文摘要

许多大规模应用程序等于从潜在候选者的巨大输出空间中找到相关结果。例如,从大型目录中找到最佳匹配产品或在搜索引擎上建议相关的搜索短语。这些问题的输出空间的大小可能从数百万到数十亿美元不等,甚至在某些应用中甚至可能是无限的。此外,对于输出空间中的长尾项目,培训数据通常受到限制。幸运的是,输出空间中的项目通常是相关的,从而提供了减轻数据稀疏问题的机会。在本文中,我们提出了对巨大和相关的输出空间(PECOS)框架的预测,这是一个多功能和模块化的机器学习框架,用于解决非常大的输出空间的预测问题,并将其应用于极端的多标签排名(XMR)问题:鉴于输入实例,鉴于输入实例,查找和从较大的相关项目中,从一个巨大的项目中进行了巨大的项目,但已固定和有限的输出和有限的输出。我们为PECOS提出了三个阶段框架:(i)在第一阶段,Pecos使用语义索引方案组织输出空间,(ii)在第二阶段,Pecos使用索引来通过机器学习的匹配方案来通过数量级来缩小输出空间,并在第三阶段中使用第三阶段的项目,使用最终级别的项目。 PECOS的多功能性和模块化允许在索引,匹配和排名阶段的各种选择中轻松插入插件。我们还开发了非常快速的推理过程,使我们能够实时执行XMR预测。例如,推理在数据集上的每个输入中的时间少于1毫秒,并带有280万个标签。 PECOS软件可在https://libpecos.org上找到。

Many large-scale applications amount to finding relevant results from an enormous output space of potential candidates. For example, finding the best matching product from a large catalog or suggesting related search phrases on a search engine. The size of the output space for these problems can range from millions to billions, and can even be infinite in some applications. Moreover, training data is often limited for the long-tail items in the output space. Fortunately, items in the output space are often correlated thereby presenting an opportunity to alleviate the data sparsity issue. In this paper, we propose the Prediction for Enormous and Correlated Output Spaces (PECOS) framework, a versatile and modular machine learning framework for solving prediction problems for very large output spaces, and apply it to the eXtreme Multilabel Ranking (XMR) problem: given an input instance, find and rank the most relevant items from an enormous but fixed and finite output space. We propose a three phase framework for PECOS: (i) in the first phase, PECOS organizes the output space using a semantic indexing scheme, (ii) in the second phase, PECOS uses the indexing to narrow down the output space by orders of magnitude using a machine learned matching scheme, and (iii) in the third phase, PECOS ranks the matched items using a final ranking scheme. The versatility and modularity of PECOS allows for easy plug-and-play of various choices for the indexing, matching, and ranking phases. We also develop very fast inference procedures which allow us to perform XMR predictions in real time; for example, inference takes less than 1 millisecond per input on the dataset with 2.8 million labels. The PECOS software is available at https://libpecos.org.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源