论文标题
使用ARM NEON扩展快速实施形态学过滤
Fast Implementation of Morphological Filtering Using ARM NEON Extension
论文作者
论文摘要
在本文中,我们考虑了手臂处理器上形态图像过滤的加速潜力。形态操作广泛用于图像分析和识别中,在某些情况下它们的加速可以大大减少识别的总体执行时间。更具体地说,我们建议使用ARM SIMD扩展霓虹灯快速实施侵蚀和扩张。这些具有矩形结构元件的操作是可分离的。它们是使用可分离性作为顺序水平和垂直通过的优势实施的。每个通行证均使用用于大窗口的Van Herk/Gil-Werman算法和小窗口的低结构线性复杂性算法实现。通过SIMD改善了最终实现,并结合了这些方法。我们还考虑了使用ARM霓虹灯对8x8和16x16矩阵的快速转置实施,以获得形态操作的额外计算增益。实验表明,与没有SIMD的Van Herk/Gil-Werman算法相比,最终实施侵蚀和扩张的效率提高了3倍,与不带SIMD的转孔相比,8x8矩阵转孔的速度为5.7倍,16x16矩阵转置的速度为5.7倍,16x16矩阵转置的速度为12倍。
In this paper we consider speedup potential of morphological image filtering on ARM processors. Morphological operations are widely used in image analysis and recognition and their speedup in some cases can significantly reduce overall execution time of recognition. More specifically, we propose fast implementation of erosion and dilation using ARM SIMD extension NEON. These operations with the rectangular structuring element are separable. They were implemented using the advantages of separability as sequential horizontal and vertical passes. Each pass was implemented using van Herk/Gil-Werman algorithm for large windows and low-constant linear complexity algorithm for small windows. Final implementation was improved with SIMD and used a combination of these methods. We also considered fast transpose implementation of 8x8 and 16x16 matrices using ARM NEON to get additional computational gain for morphological operations. Experiments showed 3 times efficiency increase for final implementation of erosion and dilation compared to van Herk/Gil-Werman algorithm without SIMD, 5.7 times speedup for 8x8 matrix transpose and 12 times speedup for 16x16 matrix transpose compared to transpose without SIMD.