论文标题
句子转换器排名器模型的极端压缩:更快的推理,电池寿命较长,在边缘设备上存储更少
Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices
论文作者
论文摘要
现代搜索系统使用几种具有变压器体系结构的大型排名模型。这些模型需要大量的计算资源,并且不适合在有限的计算资源的设备上使用。知识蒸馏是一种流行的压缩技术,可以减少此类模型的资源需求,其中大型教师模型将知识转移到小型学生模型。为了大大减少记忆需求和能耗,我们为流行的句子转换器蒸馏程序提出了两个扩展:在蒸馏之前,最佳尺寸词汇和尺寸降低了最佳尺寸词汇和尺寸降低。我们对两种不同类型的排名模型评估了这些扩展。这导致了极度压缩的学生模型,其测试数据集的分析显示了我们提出的扩展的重要性和效用。
Modern search systems use several large ranker models with transformer architectures. These models require large computational resources and are not suitable for usage on devices with limited computational resources. Knowledge distillation is a popular compression technique that can reduce the resource needs of such models, where a large teacher model transfers knowledge to a small student model. To drastically reduce memory requirements and energy consumption, we propose two extensions for a popular sentence-transformer distillation procedure: generation of an optimal size vocabulary and dimensionality reduction of the embedding dimension of teachers prior to distillation. We evaluate these extensions on two different types of ranker models. This results in extremely compressed student models whose analysis on a test dataset shows the significance and utility of our proposed extensions.