论文标题
随机舍入:算法和硬件加速器
Stochastic Rounding: Algorithms and Hardware Accelerator
论文作者
论文摘要
提出了用于执行随机舍入(SR)的算法和硬件加速器。主要目标是增加基于ARM M4F的多核处理器Spinnaker2具有比ARM处理器本身更灵活的圆形功能。在硬件中添加这种加速器的动机是基于我们先前的结果,显示了使用SR的定点算术中ODE求解器的数值精度的提高,而标准圆形圆形往返或位截面圆形模式相比。此外,由于需要伪数字生成器(PRNG),多个掩盖和转换说明以及加法操作,因此纯粹在软件中执行SR可能很昂贵。同样,包括圆形值的饱和度,因为圆形通常之后是饱和,这在固定点算术中尤为重要,这是由于狭窄的代表值动态范围。加速器的主要目的使用是圆形定点乘数输出,该输出由ARM处理器以比参数更宽的定点格式返回。
Algorithms and a hardware accelerator for performing stochastic rounding (SR) are presented. The main goal is to augment the ARM M4F based multi-core processor SpiNNaker2 with a more flexible rounding functionality than is available in the ARM processor itself. The motivation of adding such an accelerator in hardware is based on our previous results showing improvements in numerical accuracy of ODE solvers in fixed-point arithmetic with SR, compared to standard round to nearest or bit truncation rounding modes. Furthermore, performing SR purely in software can be expensive, due to requirement of a pseudorandom number generator (PRNG), multiple masking and shifting instructions, and an addition operation. Also, saturation of the rounded values is included, since rounding is usually followed by saturation, which is especially important in fixed-point arithmetic due to a narrow dynamic range of representable values. The main intended use of the accelerator is to round fixed-point multiplier outputs, which are returned unrounded by the ARM processor in a wider fixed-point format than the arguments.