论文标题
带有低成本DAC/ADC操作的电荷域P-8T SRAM计算,用于4位输入处理
A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC Operation for 4-bit Input Processing
论文作者
论文摘要
本文介绍了基于低成本的PMOS 8T(P-8T)SRAM Compute In-Memory(CIM)体系结构,该体系结构可有效地在4位输入激活和8位权重之间的多重蓄能(MAC)操作中。首先,采用比特线(BL)电荷共享技术来设计Pro-Posed SRAM CIM中4位输入激活的低成本且可靠的数字到Analog转换,其中电荷域模拟计算可提供变化的耐受性和线性MAC输出。还有效利用了16个本地阵列来实现模拟MUL tiplication单元(AMU),该单元(AMU)同时在4位输入激活和1位权重之间产生16个乘量结果。对于在不牺牲DNN准确性的情况下,硬件成本降低了模数转换器(ADC),可以执行硬件Aware Sys-TEM模拟,以决定ADC位分辨率以及所提出的CIM宏中激活行的数量。此外,对于ADC操作,基于AMU的参考Col-umms用于生成ADC参考电压,该参考电压已设计为低成本的4位粗略闪存ADC。 256x80 P-8T SRAM CIM宏实现使用28nm CMOS过程表明,所提出的CIM分别显示了CIFAR-100和CIFAR-100数据集的91.46%和66.67%的精度,其能量效率为50.07-TOP/W。
This paper presents a low cost PMOS-based 8T (P-8T) SRAM Compute-In-Memory (CIM) architecture that efficiently per-forms the multiply-accumulate (MAC) operations between 4-bit input activations and 8-bit weights. First, bit-line (BL) charge-sharing technique is employed to design the low-cost and reliable digital-to-analog conversion of 4-bit input activations in the pro-posed SRAM CIM, where the charge domain analog computing provides variation tolerant and linear MAC outputs. The 16 local arrays are also effectively exploited to implement the analog mul-tiplication unit (AMU) that simultaneously produces 16 multipli-cation results between 4-bit input activations and 1-bit weights. For the hardware cost reduction of analog-to-digital converter (ADC) without sacrificing DNN accuracy, hardware aware sys-tem simulations are performed to decide the ADC bit-resolutions and the number of activated rows in the proposed CIM macro. In addition, for the ADC operation, the AMU-based reference col-umns are utilized for generating ADC reference voltages, with which low-cost 4-bit coarse-fine flash ADC has been designed. The 256X80 P-8T SRAM CIM macro implementation using 28nm CMOS process shows that the proposed CIM shows the accuracies of 91.46% and 66.67% with CIFAR-10 and CIFAR-100 dataset, respectively, with the energy efficiency of 50.07-TOPS/W.