论文标题
从目光中解码注意力:基准数据集和端到端模型
Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models
论文作者
论文摘要
眼球追踪有可能在生态有效的环境中提供有关人类认知的丰富行为数据。但是,分析这些丰富的数据通常具有挑战性。大多数自动化分析特定于具有良好分离的静态静态区域的简单人造视觉刺激,而在复杂的视觉刺激的背景下,大多数分析(例如大多数自然场景)都依赖于费力和耗时的手动注释。本文使用计算机视觉工具进行“注意力解码”研究,这是评估参与者随着时间的公开视觉关注的轨迹的任务。我们提供了一个公开可用的多个对象吸引人(MOET)数据集,其中包括来自跟踪特定对象的参与者的凝视数据,带有标签和边界框的注释,在拥挤的现实世界视频中,用于培训和评估注意解码算法的注意力。我们还提出了两个端到端的深度学习模型,以将注意力解码与最新的启发式方法进行比较。
Eye-tracking has potential to provide rich behavioral data about human cognition in ecologically valid environments. However, analyzing this rich data is often challenging. Most automated analyses are specific to simplistic artificial visual stimuli with well-separated, static regions of interest, while most analyses in the context of complex visual stimuli, such as most natural scenes, rely on laborious and time-consuming manual annotation. This paper studies using computer vision tools for "attention decoding", the task of assessing the locus of a participant's overt visual attention over time. We provide a publicly available Multiple Object Eye-Tracking (MOET) dataset, consisting of gaze data from participants tracking specific objects, annotated with labels and bounding boxes, in crowded real-world videos, for training and evaluating attention decoding algorithms. We also propose two end-to-end deep learning models for attention decoding and compare these to state-of-the-art heuristic methods.