用于聚类的指数家族PCA的排斥混合物模型

论文标题

用于聚类的指数家族PCA的排斥混合物模型

Repulsive Mixture Models of Exponential Family PCA for Clustering

论文作者

Qiao, Maoying, Liu, Tongliang, Yu, Jun, Bian, Wei, Tao, Dacheng

论文摘要

指数家族主成分分析（EPCA）的混合扩展旨在编码与传统EPCA相比，有关数据分布的结构信息更多。例如，由于EPCA基本形式的线性性，非线性群集结构不能轻易处理，但是它们是通过混合扩展来明确建模的。但是，局部EPCA的传统混合物存在模型冗余的问题，即混合组件之间的重叠，这可能会导致数据群集的歧义。为了减轻这个问题，在本文中，在混合组件之间引入了排斥性的先验，并且在贝叶斯框架中开发了多元化的EPCA混合物（DEPCAM）模型。具体而言，确定点过程（DPP）被利用为在本地EPCA联合的多样性的先验分布。根据需要，设计了L-安装内核的矩阵值措施，其中$ \ ell_1 $约束被施加，以促进选择局部EPCAS的有效PC，并提出了基于角度的相似性度量。得出有效的变分EM算法以执行参数学习和隐藏变量推断。合成数据集和现实世界数据集的实验结果证实了该方法在模型的简约和对看不见的测试数据的概括能力方面的有效性。

The mixture extension of exponential family principal component analysis (EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA does. For example, due to the linearity of EPCA's essential form, nonlinear cluster structures cannot be easily handled, but they are explicitly modeled by the mixing extensions. However, the traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering. To alleviate this problem, in this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework. Specifically, a determinantal point process (DPP) is exploited as a diversity-encouraging prior distribution over the joint local EPCAs. As required, a matrix-valued measure for L-ensemble kernel is designed, within which, $\ell_1$ constraints are imposed to facilitate selecting effective PCs of local EPCAs, and angular based similarity measure are proposed. An efficient variational EM algorithm is derived to perform parameter learning and hidden variable inference. Experimental results on both synthetic and real-world datasets confirm the effectiveness of the proposed method in terms of model parsimony and generalization ability on unseen test data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题