论文标题
FCN置置:用于修剪和量化的CNN,以估算约束设备的机器人姿势估计
FCN-Pose: A Pruned and Quantized CNN for Robot Pose Estimation for Constrained Devices
论文作者
论文摘要
物联网设备遭受资源限制,例如处理器,RAM和光盘存储。当处理苛刻的应用(例如深度学习)时,这些局限性变得更加明显,这些应用以其繁重的计算要求而闻名。一个很好的例子是机器人姿势估计,该应用程序可以预测所需图像对象的临界点。减轻处理和存储问题的一种方法是压缩深度学习应用。本文提出了一个新的CNN,用于姿势估计,同时应用修剪和量化的压缩技术来减少他的需求并改善响应时间。虽然修剪过程减少了推理所需的参数总数,但量化降低了浮点的精度。我们使用机器人臂的姿势估计任务运行该方法,并在高端设备和受约束设备中比较结果。作为指标,我们考虑每秒浮点操作的数量(FLOPS),数学计算的总数,参数的计算,推理时间和每秒处理的视频框架数量。此外,我们进行了定性评估,我们将每个修剪网络预测的输出图像与相应的原始网络进行比较。我们将最初提议的网络降低到70%的修剪率,这意味着参数降低了88.86%,Flops降低了94.45%,而对于光盘存储,我们将要求的要求减少了70%,而仅$ 1 \%的$将误差增加了。就输入图像处理而言,该度量的台式机情况从11.71 fps增加到41.9 fps。使用受限的设备时,图像处理从2.86 fps增强到10.04 fps。提出的方法实现的较高的图像框架处理速率可以短得多的响应时间。
IoT devices suffer from resource limitations, such as processor, RAM, and disc storage. These limitations become more evident when handling demanding applications, such as deep learning, well-known for their heavy computational requirements. A case in point is robot pose estimation, an application that predicts the critical points of the desired image object. One way to mitigate processing and storage problems is compressing that deep learning application. This paper proposes a new CNN for the pose estimation while applying the compression techniques of pruning and quantization to reduce his demands and improve the response time. While the pruning process reduces the total number of parameters required for inference, quantization decreases the precision of the floating-point. We run the approach using a pose estimation task for a robotic arm and compare the results in a high-end device and a constrained device. As metrics, we consider the number of Floating-point Operations Per Second(FLOPS), the total of mathematical computations, the calculation of parameters, the inference time, and the number of video frames processed per second. In addition, we undertake a qualitative evaluation where we compare the output image predicted for each pruned network with the corresponding original one. We reduce the originally proposed network to a 70% pruning rate, implying an 88.86% reduction in parameters, 94.45% reduction in FLOPS, and for the disc storage, we reduced the requirement in 70% while increasing error by a mere $1\%$. With regard input image processing, this metric increases from 11.71 FPS to 41.9 FPS for the Desktop case. When using the constrained device, image processing augmented from 2.86 FPS to 10.04 FPS. The higher processing rate of image frames achieved by the proposed approach allows a much shorter response time.