远程竞技场：高效变压器的基准

论文标题

远程竞技场：高效变压器的基准

Long Range Arena: A Benchmark for Efficient Transformers

论文作者

Tay, Yi, Dehghani, Mostafa, Abnar, Samira, Shen, Yikang, Bahri, Dara, Pham, Philip, Rao, Jinfeng, Yang, Liu, Ruder, Sebastian, Metzler, Donald

论文摘要

变压器不能很好地扩展到长序列长度，这在很大程度上是由于二次自我发项的复杂性。在最近几个月中，已经提出了广泛的快速变压器来解决这个问题，通常不声称与香草变压器模型相比的模型质量优越或可比。到目前为止，如何评估这类模型尚无公认的共识。此外，在广泛的任务和数据集上进行基准不一致，因此很难评估许多模型之间的相对模型质量。本文提出了一个系统的统一基准LRA，专门针对在长篇下说方面评估模型质量。我们的基准是一套任务套件，该任务包括从$ 1K $到$ 16K $令牌的序列，包括各种数据类型和模态，例如文本，自然，合成图像以及需要相似性，结构性，结构性，结构性和视觉天地推理的数学表达式。我们在我们新提出的基准套件上系统地评估了十种良好的远程变压器模型（改革者，线性变压器，线性变压器，Sinkhorn变压器，表演者，合成器，稀疏变压器和远程器）。 LRA为更好地理解这类有效的变压器模型铺平了道路，促进了这一方向的更多研究，并提出了应对的新挑战性任务。我们的基准代码将在https://github.com/google-research/long-range-arena上发布。

Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity. In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models. To this date, there is no well-established consensus on how to evaluate this class of models. Moreover, inconsistent benchmarking on a wide spectrum of tasks and datasets makes it difficult to assess relative model quality amongst many models. This paper proposes a systematic and unified benchmark, LRA, specifically focused on evaluating model quality under long-context scenarios. Our benchmark is a suite of tasks consisting of sequences ranging from $1K$ to $16K$ tokens, encompassing a wide range of data types and modalities such as text, natural, synthetic images, and mathematical expressions requiring similarity, structural, and visual-spatial reasoning. We systematically evaluate ten well-established long-range Transformer models (Reformers, Linformers, Linear Transformers, Sinkhorn Transformers, Performers, Synthesizers, Sparse Transformers, and Longformers) on our newly proposed benchmark suite. LRA paves the way towards better understanding this class of efficient Transformer models, facilitates more research in this direction, and presents new challenging tasks to tackle. Our benchmark code will be released at https://github.com/google-research/long-range-arena.

下载PDF全文

下载文献需遵守相关版权规定

论文标题