表征效率与长篇文化NLP模型的精度权衡

论文标题

表征效率与长篇文化NLP模型的精度权衡

Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models

论文作者

Ang, Phyllis, Dhingra, Bhuwan, Wills, Lisa Wu

论文摘要

由于自然语言处理（NLP）的许多实际应用包括长文本，因此NLP基准测量了可以处理更长的输入序列的模型的准确性。但是，这些基准并不考虑准确性，速度和功耗之间的权衡，因为输入尺寸或模型尺寸各不相同。在这项工作中，我们在通过卷轴基准的四个数据集中进行微调和推断，对两个广泛使用的长期序列模型（Longformer-nocoder-decoder（LED）和大鸟）进行了系统的研究与效率折衷。为了研究这种权衡如何在超参数设置之间有何不同，我们将四个序列长度（1024、2048、3072、4096）的模型和两个模型尺寸（基础和大）在固定的资源预算下进行比较。我们发现，LED始终在能源成本较低的能源成本方面始终取得更好的准确性。为了进行摘要，我们发现，增加模型大小比增加序列长度更高的序列更能有效。但是，这是以大幅下降的推理速度的代价。对于问题的回答，我们发现较小的型号既有效率更高又更准确，这是由于较大的培训批量在固定的资源预算下可能的较大。

With many real-world applications of Natural Language Processing (NLP) comprising of long texts, there has been a rise in NLP benchmarks that measure the accuracy of models that can handle longer input sequences. However, these benchmarks do not consider the trade-offs between accuracy, speed, and power consumption as input sizes or model sizes are varied. In this work, we perform a systematic study of this accuracy vs. efficiency trade-off on two widely used long-sequence models - Longformer-Encoder-Decoder (LED) and Big Bird - during fine-tuning and inference on four datasets from the SCROLLS benchmark. To study how this trade-off differs across hyperparameter settings, we compare the models across four sequence lengths (1024, 2048, 3072, 4096) and two model sizes (base and large) under a fixed resource budget. We find that LED consistently achieves better accuracy at lower energy costs than Big Bird. For summarization, we find that increasing model size is more energy efficient than increasing sequence length for higher accuracy. However, this comes at the cost of a large drop in inference speed. For question answering, we find that smaller models are both more efficient and more accurate due to the larger training batch sizes possible under a fixed resource budget.

下载PDF全文

下载文献需遵守相关版权规定

论文标题