攀登WOL：更便宜的推理培训

论文标题

攀登WOL：更便宜的推理培训

Climbing the WOL: Training for Cheaper Inference

论文作者

Liu, Zichang, Xu, Zhaozhuo, Ji, Alan, Li, Jonathan, Chen, Beidi, Shrivastava, Anshumali

论文摘要

在大规模机器学习中，对宽输出层（WOL）的有效推断是一项必不可少但具有挑战性的任务。大多数方法将此问题减少到近似最大内部产品搜索（MIP），这在很大程度上取决于以下观察结果：对于给定的模型，地面真相标签对应于完整模型推断期间最高价值的逻辑。但是，这种假设在实践中是限制性的。在本文中，我们认为，尽管有亚线性计算时间，但近似MIPS子例程还是最佳的，因为它们是针对检索具有高召回率的大型内部产品而不是检索正确标签的。使用WOL，标签通常具有中等的内部产品，这使得MIP更具挑战性。我们提出了一种替代问题制定，称为标签上级采样（LSS），其目的是调整系统以确保检索正确的标签。因此，我们提出了一种新型学习的哈希方法，与MIPS基准相比，该方法明显更有效，足以提高推理精度。我们的广泛评估表明，LSS可以以大约5倍的速度和减少87％的能量匹配甚至超过全部推理精度。

Efficient inference for wide output layers (WOLs) is an essential yet challenging task in large scale machine learning. Most approaches reduce this problem to approximate maximum inner product search (MIPS), which relies heavily on the observation that for a given model, ground truth labels correspond to logits of highest value during full model inference. However, such an assumption is restrictive in practice. In this paper, we argue that approximate MIPS subroutines, despite having sub-linear computation time, are sub-optimal because they are tailored for retrieving large inner products with high recall instead of retrieving the correct labels. With WOL, the labels often have moderate inner products, which makes approximate MIPS more challenging. We propose an alternative problem formulation, called Label Superior Sampling (LSS), where the objective is to tailor the system to ensure retrieval of the correct label. Accordingly, we propose a novel learned hash approach, which is significantly more efficient and sufficient for high inference accuracy than MIPS baselines. Our extensive evaluation indicates that LSS can match or even outperform full inference accuracy with around 5x speed up and 87% energy reduction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题