希尔伯特空间中的内核双形算法

论文标题

希尔伯特空间中的内核双形算法

Kernel Biclustering algorithm in Hilbert Spaces

论文作者

Matabuena, Marcos, Vidal, J. C, Padilla, Oscar Hernan Madrid, Sejdinovic, Dino

论文摘要

双簇算法分区数据并同时协变量，在几个领域提供了新的见解，例如分析基因表达以发现新的生物学功能。本文使用能量距离（ED）和最大平均差异（MMD）的概念在抽象空间中开发了一种新的无模型双簇算法 - 能够处理复杂数据（例如曲线或图形）的概率分布之间的两个距离。所提出的方法比大多数现有文献方法都可以学习更多的通用和复杂的群集形状，这些方法通常着重于检测均值和差异差异。尽管我们方法的两次群体配置受到限制，以在基准和协变量级别创建不相交结构，但结果是有竞争力的。我们的结果在最佳场景中与最新方法相似，假设有适当的内核选择，当群集差异集中在高阶矩中时，它们的表现优于它们。该模型的性能已在涉及模拟和实际数据集的几种情况下进行了测试。最后，使用最佳运输理论的某些工具建立了新的理论一致性结果。

Biclustering algorithms partition data and covariates simultaneously, providing new insights in several domains, such as analyzing gene expression to discover new biological functions. This paper develops a new model-free biclustering algorithm in abstract spaces using the notions of energy distance (ED) and the maximum mean discrepancy (MMD) -- two distances between probability distributions capable of handling complex data such as curves or graphs. The proposed method can learn more general and complex cluster shapes than most existing literature approaches, which usually focus on detecting mean and variance differences. Although the biclustering configurations of our approach are constrained to create disjoint structures at the datum and covariate levels, the results are competitive. Our results are similar to state-of-the-art methods in their optimal scenarios, assuming a proper kernel choice, outperforming them when cluster differences are concentrated in higher-order moments. The model's performance has been tested in several situations that involve simulated and real-world datasets. Finally, new theoretical consistency results are established using some tools of the theory of optimal transport.

下载PDF全文

下载文献需遵守相关版权规定

论文标题