论文标题
关于非负CP张量分解的大规模动态主题建模
On Large-Scale Dynamic Topic Modeling with Nonnegative CP Tensor Decomposition
论文作者
论文摘要
由于数据的爆炸性增长,目前对大规模时间数据分析的前所未有的需求。动态主题建模已被广泛用于社会和数据科学,其目的是学习随着时间的推移出现,发展和淡出的潜在主题。关于动态主题建模的先前工作主要采用非负矩阵分解方法(NMF),其中数据张量的切片分别分解为低维非负矩阵的乘积。但是,使用这种方法,数据的时间维度中包含的信息通常被忽略或不足。为了克服这个问题,我们建议采用非负CANDECOMP/PARAPAC(CP)张量分解(NNCPD)的方法,在该方法中,数据张量直接分解为非阴性矢量的最小值,从而保留了时间信息。 NNCPD的生存能力通过应用于合成数据和真实数据,与典型的基于NMF的方法相比,获得了显着改善的结果。研究和讨论了NNCPD比此类方法的优点。据我们所知,这是NNCPD首次用于动态主题建模的目的,我们的发现将在应用程序和进一步的发展中具有变革性。
There is currently an unprecedented demand for large-scale temporal data analysis due to the explosive growth of data. Dynamic topic modeling has been widely used in social and data sciences with the goal of learning latent topics that emerge, evolve, and fade over time. Previous work on dynamic topic modeling primarily employ the method of nonnegative matrix factorization (NMF), where slices of the data tensor are each factorized into the product of lower-dimensional nonnegative matrices. With this approach, however, information contained in the temporal dimension of the data is often neglected or underutilized. To overcome this issue, we propose instead adopting the method of nonnegative CANDECOMP/PARAPAC (CP) tensor decomposition (NNCPD), where the data tensor is directly decomposed into a minimal sum of outer products of nonnegative vectors, thereby preserving the temporal information. The viability of NNCPD is demonstrated through application to both synthetic and real data, where significantly improved results are obtained compared to those of typical NMF-based methods. The advantages of NNCPD over such approaches are studied and discussed. To the best of our knowledge, this is the first time that NNCPD has been utilized for the purpose of dynamic topic modeling, and our findings will be transformative for both applications and further developments.