论文标题
从异质数据的空中联邦学习
Over-the-Air Federated Learning from Heterogeneous Data
论文作者
论文摘要
联合学习(FL)是用于集中模型的分布式学习的框架。在FL中,一组边缘设备使用其本地数据训练模型,同时与中央服务器反复交换了训练有素的更新。此过程允许以分布式方式调整集中式模型,而不会让用户共享他们的私人数据。在本文中,我们专注于空中(OTA)FL,由于模型更新的重复传输是由大量用户在无线频道上反复进行的,该文章最近建议减少FL的通信开销。在OTA FL中,所有用户同时将其更新作为模拟信号在多个访问通道上传输,并且服务器接收模拟传输信号的叠加。但是,这种方法导致通道噪声直接影响优化过程,这可能会降低受过训练的模型的准确性。我们开发了一种收敛的OTA FL(COTAF)算法,该算法增强了常见的局部随机梯度下降(SGD)FL算法,在用户向用户进行了预编码并在服务器上进行缩放,从而逐渐减轻噪声的效果。我们分析了COTAF与损失最小化模型的收敛性,并量化了统计上异质设置的效果,即,当每个用户的训练数据遵循不同的分布时。我们的分析揭示了COTAF实现收敛速率类似于无错误通道的能力。我们的模拟证明了使用非合成数据集的COTAF超过Vanilla OTA本地SGD的融合改善。此外,我们从数值上表明,COTAF诱导的预编码明显提高了通过OTA FL训练的模型的收敛速率和准确性。
Federated learning (FL) is a framework for distributed learning of centralized models. In FL, a set of edge devices train a model using their local data, while repeatedly exchanging their trained updates with a central server. This procedure allows tuning a centralized model in a distributed fashion without having the users share their possibly private data. In this paper, we focus on over-the-air (OTA) FL, which has been suggested recently to reduce the communication overhead of FL due to the repeated transmissions of the model updates by a large number of users over the wireless channel. In OTA FL, all users simultaneously transmit their updates as analog signals over a multiple access channel, and the server receives a superposition of the analog transmitted signals. However, this approach results in the channel noise directly affecting the optimization procedure, which may degrade the accuracy of the trained model. We develop a Convergent OTA FL (COTAF) algorithm which enhances the common local stochastic gradient descent (SGD) FL algorithm, introducing precoding at the users and scaling at the server, which gradually mitigates the effect of the noise. We analyze the convergence of COTAF to the loss minimizing model and quantify the effect of a statistically heterogeneous setup, i.e. when the training data of each user obeys a different distribution. Our analysis reveals the ability of COTAF to achieve a convergence rate similar to that achievable over error-free channels. Our simulations demonstrate the improved convergence of COTAF over vanilla OTA local SGD for training using non-synthetic datasets. Furthermore, we numerically show that the precoding induced by COTAF notably improves the convergence rate and the accuracy of models trained via OTA FL.