论文标题
使用多模式功能在教育视频中对重要细分的分类
Classification of Important Segments in Educational Videos using Multimodal Features
论文作者
论文摘要
视频是在网络搜索过程中学习中通常使用的内容。许多电子学习平台都提供了优质的内容,但有时教育视频很长,涵盖了许多主题。人类擅长从视频中提取重要部分,但对于计算机来说仍然是一个重大挑战。在本文中,我们解决了为视频片段分配重要性分数的问题,这就是他们在教育视频的整体主题方面包含的信息。我们提供了一个注释工具和一个新的从流行在线学习平台收集的带注释的教育视频的数据集。此外,我们提出了一种多模式的神经体系结构,该神经体系结构利用最先进的音频,视觉和文本功能。我们的实验研究了视觉和时间信息的影响,以及多模式特征对重要性预测的组合。
Videos are a commonly-used type of content in learning during Web search. Many e-learning platforms provide quality content, but sometimes educational videos are long and cover many topics. Humans are good in extracting important sections from videos, but it remains a significant challenge for computers. In this paper, we address the problem of assigning importance scores to video segments, that is how much information they contain with respect to the overall topic of an educational video. We present an annotation tool and a new dataset of annotated educational videos collected from popular online learning platforms. Moreover, we propose a multimodal neural architecture that utilizes state-of-the-art audio, visual and textual features. Our experiments investigate the impact of visual and temporal information, as well as the combination of multimodal features on importance prediction.