论文标题
ERESFD:重新发现标准卷积轻量化面部检测的有效性
EResFD: Rediscovery of the Effectiveness of Standard Convolution for Lightweight Face Detection
论文作者
论文摘要
本文分析了面部检测体系结构的设计选择,以提高计算成本和准确性的效率。具体而言,我们重新检查了标准卷积块作为轻巧的骨干架构的有效性。与当前轻巧的体系结构设计的趋势(大量利用了可分离的卷积层)不同,我们表明,使用类似的参数大小时,大量通道频道绕的标准卷积层可以实现更好的准确性和推理速度。关于目标数据域特征的分析,面对面的分析支持了这一观察结果。根据我们的观察,我们建议使用高度降低的通道使用Resnet,与其他移动友好的网络(例如Mobilenetv1,V2,V3)相比,它令人惊讶地允许高效率。从广泛的实验中,我们表明所提出的主链可以以更快的推理速度替换最先进的面部检测器的主链。此外,我们进一步提出了一种新的功能聚合方法,以最大程度地提高检测性能。我们提出的检测器ERESFD获得了更宽的面部硬式子集的80.4%地图,这仅需37.7 ms即可在CPU上进行VGA图像推断。代码可在https://github.com/clovaai/ereesfd上找到。
This paper analyzes the design choices of face detection architecture that improve efficiency of computation cost and accuracy. Specifically, we re-examine the effectiveness of the standard convolutional block as a lightweight backbone architecture for face detection. Unlike the current tendency of lightweight architecture design, which heavily utilizes depthwise separable convolution layers, we show that heavily channel-pruned standard convolution layers can achieve better accuracy and inference speed when using a similar parameter size. This observation is supported by the analyses concerning the characteristics of the target data domain, faces. Based on our observation, we propose to employ ResNet with a highly reduced channel, which surprisingly allows high efficiency compared to other mobile-friendly networks (e.g., MobileNetV1, V2, V3). From the extensive experiments, we show that the proposed backbone can replace that of the state-of-the-art face detector with a faster inference speed. Also, we further propose a new feature aggregation method to maximize the detection performance. Our proposed detector EResFD obtained 80.4% mAP on WIDER FACE Hard subset which only takes 37.7 ms for VGA image inference on CPU. Code is available at https://github.com/clovaai/EResFD.