论文标题
旨在了解神经网络的联合学习的质量挑战:稳健性的镜头的首次浏览
Towards Understanding Quality Challenges of the Federated Learning for Neural Networks: A First Look from the Lens of Robustness
论文作者
论文摘要
联合学习(FL)是一个分布式学习范式,可保留用户的数据隐私,同时利用所有参与者的整个数据集。在FL中,多个模型对客户进行了独立培训,并集中培训,以在迭代过程中更新全球模型。尽管这种方法在保留隐私方面非常出色,但FL仍然遭受诸如攻击或拜占庭断层之类的质量问题。最近已经尝试了针对FL的强大聚合技术应对这种质量挑战的尝试。但是,最先进(SOTA)强大的FL技术的有效性仍然不清楚,并且缺乏全面的研究。因此,为了更好地了解在存在攻击和故障的情况下这些SOTA FL技术的当前质量状况和挑战,我们进行了一项大规模的经验研究,以从多个攻击,模拟故障(通过突变算子)和汇总(防御)方法从多个攻击,模拟故障(通过模拟故障)进行研究。特别是,我们研究了FL在图像分类任务上的性能,并将DNN作为模型类型。此外,我们对两个通用图像数据集和一个现实世界联合医疗图像数据集进行了研究。我们还研究了受影响客户的比例和数据集分布因子对FL的鲁棒性的影响。经过496个配置进行大规模分析后,我们发现每个用户上的大多数突变器对通用数据集中的最终模型都有可忽略的影响,并且其中只有一个在医疗数据集中有效。此外,我们表明模型中毒攻击比数据中毒攻击更有效。此外,选择最强大的FL聚合器取决于攻击和数据集。最后,我们说明,一个简单的聚合器合奏可以比任何单个聚合器获得更强大的解决方案,并且是75%的情况下的最佳选择。
Federated learning (FL) is a distributed learning paradigm that preserves users' data privacy while leveraging the entire dataset of all participants. In FL, multiple models are trained independently on the clients and aggregated centrally to update a global model in an iterative process. Although this approach is excellent at preserving privacy, FL still suffers from quality issues such as attacks or byzantine faults. Recent attempts have been made to address such quality challenges on the robust aggregation techniques for FL. However, the effectiveness of state-of-the-art (SOTA) robust FL techniques is still unclear and lacks a comprehensive study. Therefore, to better understand the current quality status and challenges of these SOTA FL techniques in the presence of attacks and faults, we perform a large-scale empirical study to investigate the SOTA FL's quality from multiple angles of attacks, simulated faults (via mutation operators), and aggregation (defense) methods. In particular, we study FL's performance on the image classification tasks and use DNNs as our model type. Furthermore, we perform our study on two generic image datasets and one real-world federated medical image dataset. We also investigate the effect of the proportion of affected clients and the dataset distribution factors on the robustness of FL. After a large-scale analysis with 496 configurations, we find that most mutators on each user have a negligible effect on the final model in the generic datasets, and only one of them is effective in the medical dataset. Furthermore, we show that model poisoning attacks are more effective than data poisoning attacks. Moreover, choosing the most robust FL aggregator depends on the attacks and datasets. Finally, we illustrate that a simple ensemble of aggregators achieves a more robust solution than any single aggregator and is the best choice in 75% of the cases.