Dynafed：通过全球动态解决客户数据异质性

论文标题

Dynafed：通过全球动态解决客户数据异质性

DYNAFED: Tackling Client Data Heterogeneity with Global Dynamics

论文作者

Pi, Renjie, Zhang, Weizhong, Xie, Yueqi, Gao, Jiahui, Wang, Xiaoyu, Kim, Sunghun, Chen, Qifeng

论文摘要

已知联合学习（FL）范式在异质客户数据下面临挑战。对非IID分布式数据的本地培训导致偏转的本地最优培训，这会导致客户端模型彼此越来越远，并降低了汇总的全球模型的性能。一种自然的解决方案是将所有客户端数据收集到服务器上，以便服务器具有整个数据分布的全局视图。不幸的是，这减少了定期培训，这损害了客户的隐私和佛罗里达州的冲突。在本文中，我们提出了一个想法，可以在不阻碍数据隐私的情况下收集和利用服务器上的全局知识。我们从全球模型轨迹的动力学中汲取了这种知识。具体来说，我们首先保留服务器上全局模型快照的简短轨迹。然后，我们合成一个小型伪数据集，以便在其上训练的模型模仿了保留的全局模型轨迹的动力学。之后，合成的数据用于帮助将偏转的客户端汇总到全局模型中。我们将我们的方法命名为Dynafed，它具有以下优势：1）我们不依赖任何外部服务器数据集，这不需要数据收集的额外费用； 2）伪数据可以在早期通信回合中合成，这使Dynaf能够尽早生效，以提高收敛性和稳定训练； 3）伪数据仅需要一次合成一次，并且可以直接在服务器上使用以帮助在随后的一轮中进行聚合。进行了广泛基准的实验，以展示Dynafed的有效性。我们还提供了对方法的基本机制的见解和理解。

The Federated Learning (FL) paradigm is known to face challenges under heterogeneous client data. Local training on non-iid distributed data results in deflected local optimum, which causes the client models drift further away from each other and degrades the aggregated global model's performance. A natural solution is to gather all client data onto the server, such that the server has a global view of the entire data distribution. Unfortunately, this reduces to regular training, which compromises clients' privacy and conflicts with the purpose of FL. In this paper, we put forth an idea to collect and leverage global knowledge on the server without hindering data privacy. We unearth such knowledge from the dynamics of the global model's trajectory. Specifically, we first reserve a short trajectory of global model snapshots on the server. Then, we synthesize a small pseudo dataset such that the model trained on it mimics the dynamics of the reserved global model trajectory. Afterward, the synthesized data is used to help aggregate the deflected clients into the global model. We name our method Dynafed, which enjoys the following advantages: 1) we do not rely on any external on-server dataset, which requires no additional cost for data collection; 2) the pseudo data can be synthesized in early communication rounds, which enables Dynafed to take effect early for boosting the convergence and stabilizing training; 3) the pseudo data only needs to be synthesized once and can be directly utilized on the server to help aggregation in subsequent rounds. Experiments across extensive benchmarks are conducted to showcase the effectiveness of Dynafed. We also provide insights and understanding of the underlying mechanism of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题