分数跳过：迈向更细粒的动态CNN推断

论文标题

分数跳过：迈向更细粒的动态CNN推断

Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference

论文作者

Shen, Jianghao, Fu, Yonggan, Wang, Yue, Xu, Pengfei, Wang, Zhangyang, Lin, Yingyan

论文摘要

尽管越来越深的网络仍然需要用于实现最先进的性能，但对于许多特定输入而言，更简单的网络可能已经足够了。现有作品通过学习以输入依赖性方式跳过卷积层来利用这一观察结果。但是，我们认为他们的二进制决策方案，即，可以通过引入更细粒度的“柔软”决策来增强他们的二进制决策方案，或者完全绕过一层以全面执行或完全绕过一层。因此，我们提出了动态的分数跳过（DFS）框架。 DFS的核心思想是假设层次量化（到不同的位宽）为中间的“软”选择，可以在充分利用和跳过一层之间做出。对于每个输入，DFS动态分配了每一层的权重和激活的位，其中完全执行和跳过可以将其视为两个“极端”（即，完整的位宽和零位宽）。通过这种方式，DFS可以在输入自适应推理过程中“分类”利用图层的表达能力，从而实现了较高的精度计算成本折衷。它提出了统一的视图，以将输入自适应层跳过和输入自适应混合量化。广泛的实验结果表明，DFS实现的计算成本和模型表达能力（准确性）之间的卓越权衡。更多的可视化还表明，当DFS行为中，当总计算预算变化时，层跳过和不同的量化之间的学习选择，尤其是在层跳过和不同的量化之间的选择，这证实了我们的假设，即可以将图层量化视为层跳过的中间变体。我们的源代码和补充材料可在\ link {https://github.com/torment123/dfs}上获得。

While increasingly deep networks are still in general desired for achieving state-of-the-art performance, for many specific inputs a simpler network might already suffice. Existing works exploited this observation by learning to skip convolutional layers in an input-dependent manner. However, we argue their binary decision scheme, i.e., either fully executing or completely bypassing one layer for a specific input, can be enhanced by introducing finer-grained, "softer" decisions. We therefore propose a Dynamic Fractional Skipping (DFS) framework. The core idea of DFS is to hypothesize layer-wise quantization (to different bitwidths) as intermediate "soft" choices to be made between fully utilizing and skipping a layer. For each input, DFS dynamically assigns a bitwidth to both weights and activations of each layer, where fully executing and skipping could be viewed as two "extremes" (i.e., full bitwidth and zero bitwidth). In this way, DFS can "fractionally" exploit a layer's expressive power during input-adaptive inference, enabling finer-grained accuracy-computational cost trade-offs. It presents a unified view to link input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive experimental results demonstrate the superior tradeoff between computational cost and model expressive power (accuracy) achieved by DFS. More visualizations also indicate a smooth and consistent transition in the DFS behaviors, especially the learned choices between layer skipping and different quantizations when the total computational budgets vary, validating our hypothesis that layer quantization could be viewed as intermediate variants of layer skipping. Our source code and supplementary material are available at \link{https://github.com/Torment123/DFS}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题