论文标题
VIDI:事件的视频数据集
VIDI: A Video Dataset of Incidents
论文作者
论文摘要
自然发现自然灾害和事件已成为快速反应的工具变得越来越重要。有许多研究可以使用静止图像和文本检测事件。但是,利用时间信息的方法数量相当有限。造成这种情况的主要原因之一是不存在具有各种事件类型的多样化的视频数据集。为了满足这一需求,在本文中,我们提供了一个视频数据集,事件的视频数据集Vidi,其中包含4,534个与43个事件类别相对应的视频剪辑。每个事件类都有大约100个视频,平均持续十秒钟。为了提高多样性,已经以几种语言搜索了视频。为了评估最新的最新方法的性能,视觉变压器和时间表,以及探索基于视频的信息对事件分类的贡献,我们在VIDI和事件数据集上进行了基准实验。我们已经表明,最近的方法提高了事件分类的准确性。我们发现,使用视频数据对任务非常有益。通过使用视频数据,TOP-1的精度从67.37%提高到76.56%,这是使用单个框架获得的。 Vidi将公开提供。可以在以下链接上找到其他材料:https://github.com/vididataset/vidi。
Automatic detection of natural disasters and incidents has become more important as a tool for fast response. There have been many studies to detect incidents using still images and text. However, the number of approaches that exploit temporal information is rather limited. One of the main reasons for this is that a diverse video dataset with various incident types does not exist. To address this need, in this paper we present a video dataset, Video Dataset of Incidents, VIDI, that contains 4,534 video clips corresponding to 43 incident categories. Each incident class has around 100 videos with a duration of ten seconds on average. To increase diversity, the videos have been searched in several languages. To assess the performance of the recent state-of-the-art approaches, Vision Transformer and TimeSformer, as well as to explore the contribution of video-based information for incident classification, we performed benchmark experiments on the VIDI and Incidents Dataset. We have shown that the recent methods improve the incident classification accuracy. We have found that employing video data is very beneficial for the task. By using the video data, the top-1 accuracy is increased to 76.56% from 67.37%, which was obtained using a single frame. VIDI will be made publicly available. Additional materials can be found at the following link: https://github.com/vididataset/VIDI.