一起训练变压器

论文标题

一起训练变压器

Training Transformers Together

论文作者

Borzunov, Alexander, Ryabinin, Max, Dettmers, Tim, Lhoest, Quentin, Saulnier, Lucile, Diskin, Michael, Jernite, Yacine, Wolf, Thomas

论文摘要

培训最先进模型所需的基础设施变得过于昂贵，这使得培训此类模型仅适用于大型公司和机构。最近的工作提出了几种协作培训此类模型的方法，即通过将许多独立方的硬件汇总在一起，并通过互联网训练共享模型。在此演示中，我们协作训练了类似于Openai Dall-E的文本到图像变压器。我们邀请观众加入正在进行的训练运行，向他们展示有关如何使用可用硬件贡献的说明。我们解释了如何应对与此类训练运行相关的工程挑战（缓慢的沟通，有限的内存，设备之间的性能不均和安全问题），并讨论了观众如何设置协作培训。最后，我们表明所得模型在许多提示上生成了合理质量的图像。

The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large corporations and institutions. Recent work proposes several methods for training such models collaboratively, i.e., by pooling together hardware from many independent parties and training a shared model over the Internet. In this demonstration, we collaboratively trained a text-to-image transformer similar to OpenAI DALL-E. We invited the viewers to join the ongoing training run, showing them instructions on how to contribute using the available hardware. We explained how to address the engineering challenges associated with such a training run (slow communication, limited memory, uneven performance between devices, and security concerns) and discussed how the viewers can set up collaborative training runs themselves. Finally, we show that the resulting model generates images of reasonable quality on a number of prompts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题