论文标题

知识吸引的贝叶斯深题主题模型

Knowledge-Aware Bayesian Deep Topic Model

论文作者

Wang, Dongsheng, Xu, Yishi, Li, Miaoge, Duan, Zhibin, Wang, Chaojie, Chen, Bo, Zhou, Mingyuan

论文摘要

我们提出了一个贝叶斯生成模型,用于将先前的域知识纳入分层主题建模。尽管嵌入式主题模型(ETM)及其变体在文本分析中获得了有希望的性能,但它们主要集中于采矿单词共存模式,而忽略了可能有助于增强主题连贯性的潜在易于获取的先前主题层次结构。尽管最近提出了几种基于知识的主题模型,但它们要么仅适用于浅层层次结构,要么对提供的先验知识的质量敏感。为此,我们开发了一种新颖的深度ETM,该ETM通过将单词和主题嵌入到同一空间中,将文档和先验知识共同建模。在提供的知识的指导下,提出的模型倾向于发现组织为可解释的分类法的主题层次结构。此外,借助适应给定图的技术,我们的扩展版本允许对所提供的先前主题结构进行填充以匹配目标语料库。广泛的实验表明,我们提出的模型有效地整合了先验知识,并改善了层次结构的主题发现和文档表示。

We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling. Although embedded topic models (ETMs) and its variants have gained promising performance in text analysis, they mainly focus on mining word co-occurrence patterns, ignoring potentially easy-to-obtain prior topic hierarchies that could help enhance topic coherence. While several knowledge-based topic models have recently been proposed, they are either only applicable to shallow hierarchies or sensitive to the quality of the provided prior knowledge. To this end, we develop a novel deep ETM that jointly models the documents and the given prior knowledge by embedding the words and topics into the same space. Guided by the provided knowledge, the proposed model tends to discover topic hierarchies that are organized into interpretable taxonomies. Besides, with a technique for adapting a given graph, our extended version allows the provided prior topic structure to be finetuned to match the target corpus. Extensive experiments show that our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源