通过增量聚类的联合学习，用于异质数据

论文标题

通过增量聚类的联合学习，用于异质数据

Federated learning with incremental clustering for heterogeneous data

论文作者

Castellon, Fabiola Espinoza, Mayoue, Aurelien, Sublemontier, Jacques-Henri, Gouy-Pailler, Cedric

论文摘要

联合学习使不同的各方能够在服务器的编排下协作建立一个全球模型，同时将培训数据保留在客户的设备上。但是，当客户具有异质数据时，性能会受到影响。为了应对这个问题，我们假设尽管数据异质性，但有些客户具有类似的数据分布，可以集群。在以前的方法中，为了群集客户端，服务器要求客户端同时发送其参数。但是，在有大量参与者可能有限的可用性的情况下，这可能是有问题的。为了防止这种瓶颈，我们提出了FLIC（使用增量聚类的联合学习），其中服务器利用客户在联合培训期间发送的更新，而不是要求他们同时发送参数。因此，除了经典的联合学习所需的内容外，服务器与客户之间没有其他额外的通信。我们从经验上证明了各种非IID案例，我们的方法成功地按照相同的数据分布将客户分组分组。我们还通过研究在联邦学习过程的早期阶段对客户进行分区的能力来确定FLIC的局限性。我们进一步将对模型的攻击作为数据异质性的一种形式，并从经验上表明，即使恶意客户的比例高于50 \％，FLIC也是针对中毒攻击的强大防御。

Federated learning enables different parties to collaboratively build a global model under the orchestration of a server while keeping the training data on clients' devices. However, performance is affected when clients have heterogeneous data. To cope with this problem, we assume that despite data heterogeneity, there are groups of clients who have similar data distributions that can be clustered. In previous approaches, in order to cluster clients the server requires clients to send their parameters simultaneously. However, this can be problematic in a context where there is a significant number of participants that may have limited availability. To prevent such a bottleneck, we propose FLIC (Federated Learning with Incremental Clustering), in which the server exploits the updates sent by clients during federated training instead of asking them to send their parameters simultaneously. Hence no additional communications between the server and the clients are necessary other than what classical federated learning requires. We empirically demonstrate for various non-IID cases that our approach successfully splits clients into groups following the same data distributions. We also identify the limitations of FLIC by studying its capability to partition clients at the early stages of the federated learning process efficiently. We further address attacks on models as a form of data heterogeneity and empirically show that FLIC is a robust defense against poisoning attacks even when the proportion of malicious clients is higher than 50\%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题