多源数据分析的异质性意识分布式学习

论文标题

多源数据分析的异质性意识分布式学习

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis

论文作者

Chen, Yuanxing, Zhang, Qingzhao, Ma, Shuangge, Fang, Kuangnan

论文摘要

在从金融到奥米奇的各种领域中，分布数据并具有多个单个来源（在某些研究中称为``客户''）越来越普遍。例如，当对隐私保护有考虑时，集成原始数据虽然强大，但通常不可行。已经开发了分布式学习技术来整合摘要统计数据，而不是原始数据。在许多现有的分布式学习研究中，严格假定所有客户都有相同的模型。为了适应数据异质性，一些联合学习方法允许特定于客户的模型。在本文中，我们考虑了客户形成群集的情况，同一集群中的群集具有相同的模型，并且不同的簇具有不同的模型。进一步考虑聚类结构可以更好地理解客户之间的``互连''并减少参数的数量。为此，我们开发了一种新颖的惩罚方法。具体而言，对重要变量的正则估计和选择进行了群体惩罚，并对自动聚类客户施加了融合惩罚。开发了有效的ADMM算法，并在轻度条件下建立了估计，选择和聚类一致性。仿真和数据分析进一步证明了所提出方法的实用性和优越性。

In diverse fields ranging from finance to omics, it is increasingly common that data is distributed and with multiple individual sources (referred to as ``clients'' in some studies). Integrating raw data, although powerful, is often not feasible, for example, when there are considerations on privacy protection. Distributed learning techniques have been developed to integrate summary statistics as opposed to raw data. In many of the existing distributed learning studies, it is stringently assumed that all the clients have the same model. To accommodate data heterogeneity, some federated learning methods allow for client-specific models. In this article, we consider the scenario that clients form clusters, those in the same cluster have the same model, and different clusters have different models. Further considering the clustering structure can lead to a better understanding of the ``interconnections'' among clients and reduce the number of parameters. To this end, we develop a novel penalization approach. Specifically, group penalization is imposed for regularized estimation and selection of important variables, and fusion penalization is imposed to automatically cluster clients. An effective ADMM algorithm is developed, and the estimation, selection, and clustering consistency properties are established under mild conditions. Simulation and data analysis further demonstrate the practical utility and superiority of the proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题