论文标题
人群场景分析通过输出编码
Crowd Scene Analysis by Output Encoding
论文作者
论文摘要
人群场景分析由于其广泛的应用而受到越来越多的关注。掌握准确的人群位置(而不是仅仅是人群人数)对于在拥挤场景中识别高风险区域很重要。在本文中,我们提出了一个基于压缩感应的输出编码(CSOE)方案,该方案将检测小对象的像素坐标铸成编码信号空间中的信号回归任务。在目标高度拥挤而没有巨大规模差异的情况下,CSOE有助于提高本地化性能。此外,由于人类大小的变化,适当的接受场大小对于人群分析至关重要。我们创建了多个扩张的卷积分支(MDCB),该分支提供了一组不同的接受场大小,以提高对象大小在图像中发生巨大变化时的定位精度。此外,我们开发了一个自适应的接收场权加权(ARFW)模块,该模块通过适应具有适当接受场大小的信息渠道来进一步处理量表变化问题。实验证明了该方法的有效性,该方法在四个主流数据集中实现了最先进的性能,尤其是在高度拥挤的场景中取得了出色的成果。更重要的是,实验支持我们的见解,即解决人群分析任务中的目标大小变化问题至关重要,并且将人群定位作为编码信号空间的回归对人群分析非常有效。
Crowd scene analysis receives growing attention due to its wide applications. Grasping the accurate crowd location (rather than merely crowd count) is important for spatially identifying high-risk regions in congested scenes. In this paper, we propose a Compressed Sensing based Output Encoding (CSOE) scheme, which casts detecting pixel coordinates of small objects into a task of signal regression in encoding signal space. CSOE helps to boost localization performance in circumstances where targets are highly crowded without huge scale variation. In addition, proper receptive field sizes are crucial for crowd analysis due to human size variations. We create Multiple Dilated Convolution Branches (MDCB) that offers a set of different receptive field sizes, to improve localization accuracy when objects sizes change drastically in an image. Also, we develop an Adaptive Receptive Field Weighting (ARFW) module, which further deals with scale variation issue by adaptively emphasizing informative channels that have proper receptive field size. Experiments demonstrate the effectiveness of the proposed method, which achieves state-of-the-art performance across four mainstream datasets, especially achieves excellent results in highly crowded scenes. More importantly, experiments support our insights that it is crucial to tackle target size variation issue in crowd analysis task, and casting crowd localization as regression in encoding signal space is quite effective for crowd analysis.