论文标题
重新设计用于人群计数的多尺度神经网络
Redesigning Multi-Scale Neural Network for Crowd Counting
论文作者
论文摘要
透视扭曲和人群的变化使人群在计算机视觉中计算一项具有挑战性的任务。为了解决这个问题,许多以前的作品都使用了深神经网络(DNNS)中的多尺度架构。可以直接合并多尺度分支(例如,通过串联)合并,也可以通过DNN中的代理(例如注意力)进行合并。尽管存在盛行,但这些组合方法的复杂性不足以应对多尺度密度图上的每个像素性能差异。在这项工作中,我们通过引入密度专家的层次混合物来重新设计多尺度神经网络,该密度专家层次合并了多尺度密度图以进行人群计数。在层次结构中,提出了一项专家竞争和协作计划,以鼓励各种规模的贡献;引入了像素软门网,以提供像素的软重量,以用于不同层次结构的比例组合。使用人群密度图和本地计数图对网络进行了优化,该图是通过前者对局部集成获得的。优化两者的潜在冲突可能是有问题的。我们基于图像中硬预测的本地区域之间的相对计数差异引入了新的相对局部计数损失,事实证明,这与密度图上的常规绝对误差损失互补。实验表明,我们的方法可以在五个公共数据集上实现最先进的性能,即shanghaitech,ucf_cc_50,jhu-crowd ++,nwpu-crowd和trancos。
Perspective distortions and crowd variations make crowd counting a challenging task in computer vision. To tackle it, many previous works have used multi-scale architecture in deep neural networks (DNNs). Multi-scale branches can be either directly merged (e.g. by concatenation) or merged through the guidance of proxies (e.g. attentions) in the DNNs. Despite their prevalence, these combination methods are not sophisticated enough to deal with the per-pixel performance discrepancy over multi-scale density maps. In this work, we redesign the multi-scale neural network by introducing a hierarchical mixture of density experts, which hierarchically merges multi-scale density maps for crowd counting. Within the hierarchical structure, an expert competition and collaboration scheme is presented to encourage contributions from all scales; pixel-wise soft gating nets are introduced to provide pixel-wise soft weights for scale combinations in different hierarchies. The network is optimized using both the crowd density map and the local counting map, where the latter is obtained by local integration on the former. Optimizing both can be problematic because of their potential conflicts. We introduce a new relative local counting loss based on relative count differences among hard-predicted local regions in an image, which proves to be complementary to the conventional absolute error loss on the density map. Experiments show that our method achieves the state-of-the-art performance on five public datasets, i.e. ShanghaiTech, UCF_CC_50, JHU-CROWD++, NWPU-Crowd and Trancos.