论文标题
线性代数有效,任意高精度硬件对数算术
Efficient, arbitrarily high precision hardware logarithmic arithmetic for linear algebra
论文作者
论文摘要
对数数字系统(LNS)可以说是由于指数电路开销而与算术表相对于算术精度而言,并未广泛使用。已经提出了减少该开销的方法,但仍具有高芯片区域和功率要求的产量设计。使用仍然限制为较低的精度或高倍数/添加比例,而线性代数的大部分(接近1:1倍数/添加比率接近)不合格。我们提供了与使用中的浮点相当的双基近似对数算术,但与LN不同,它很容易完全管道,可扩展到任意精度,并以$ O(n^2)$架空开销,并以1:1的速度高效/添加比率。与FLOAT32或FLOAT64矢量内部产品相比,我们的设计在7 nm CMOS中的设计分别为2.3倍和4.6倍。它取决于EXP和日志评估5.4倍和3.2倍的能源效率,在0.23倍和0.37倍的芯片面积上,具有等效精度,使用换档和添加循环和近似ODE的整合,以Revol和Yakoubsohn的方式使用标准双曲线脐带。该技术是低功率,高精度硬化的线性代数,计算机视觉,图形和机器学习应用中的新型设计替代方案。
The logarithmic number system (LNS) is arguably not broadly used due to exponential circuit overheads for summation tables relative to arithmetic precision. Methods to reduce this overhead have been proposed, yet still yield designs with high chip area and power requirements. Use remains limited to lower precision or high multiply/add ratio cases, while much of linear algebra (near 1:1 multiply/add ratio) does not qualify. We present a dual-base approximate logarithmic arithmetic comparable to floating point in use, yet unlike LNS it is easily fully pipelined, extendable to arbitrary precision with $O(n^2)$ overhead, and energy efficient at a 1:1 multiply/add ratio. Compared to float32 or float64 vector inner product with FMA, our design is respectively 2.3x and 4.6x more energy efficient in 7 nm CMOS. It depends on exp and log evaluation 5.4x and 3.2x more energy efficient, at 0.23x and 0.37x the chip area for equivalent accuracy versus standard hyperbolic CORDIC using shift-and-add and approximated ODE integration in the style of Revol and Yakoubsohn. This technique is a novel design alternative for low power, high precision hardened linear algebra in computer vision, graphics and machine learning applications.