HDP和LDA模型的统一，以最佳的主题聚类特定问题库

论文标题

HDP和LDA模型的统一，以最佳的主题聚类特定问题库

Unification of HDP and LDA Models for Optimal Topic Clustering of Subject Specific Question Banks

论文作者

Fernandes, Nikhil, Gkolia, Alexandra, Pizzo, Nicolas, Davenport, James, Nair, Akshar

论文摘要

大学的课程改造趋势越来越流行，使教学更具互动性和适合在线课程。在线课程的受欢迎程度的提高将导致学者与课程相关的查询数量增加。这是一个事实，即如果以视频按需格式进行讲座，那么大多数学生都不会有固定的时间提出问题。当在演讲中提出问题时，有可能反复提出类似问题的机会可以忽略不计，但是异步的情况更有可能。为了减少回答每个问题的时间，将它们聚集是一个理想的选择。有不同的无监督模型适合文本聚类，其中潜在的Dirichlet分配模型是最常用的。我们使用分层dirichlet过程来确定LDA模型运行的最佳主题编号输入。由于这些主题模型的概率性质，它们的输出在不同的运行中有所不同。我们发现的一般趋势是，并非所有主题都用于在LDA模型的第一次运行中用于聚类，这导致了效率较低的聚类。为了解决概率输出，我们递归地使用LDA模型在使用的有效主题上，直到获得1的效率比为1。通过我们的实验结果，我们还建立了有关如何避免Zeno的悖论的推理。

There has been an increasingly popular trend in Universities for curriculum transformation to make teaching more interactive and suitable for online courses. An increase in the popularity of online courses would result in an increase in the number of course-related queries for academics. This, coupled with the fact that if lectures were delivered in a video on demand format, there would be no fixed time where the majority of students could ask questions. When questions are asked in a lecture there is a negligible chance of having similar questions repeatedly, but asynchronously this is more likely. In order to reduce the time spent on answering each individual question, clustering them is an ideal choice. There are different unsupervised models fit for text clustering, of which the Latent Dirichlet Allocation model is the most commonly used. We use the Hierarchical Dirichlet Process to determine an optimal topic number input for our LDA model runs. Due to the probabilistic nature of these topic models, the outputs of them vary for different runs. The general trend we found is that not all the topics were being used for clustering on the first run of the LDA model, which results in a less effective clustering. To tackle probabilistic output, we recursively use the LDA model on the effective topics being used until we obtain an efficiency ratio of 1. Through our experimental results we also establish a reasoning on how Zeno's paradox is avoided.

下载PDF全文

下载文献需遵守相关版权规定

论文标题