论文标题
基于重新拉姆的跨杆阵列的初始化失败的硬件和软件优化
Hardware and software co-optimization for the initialization failure of the ReRAM based cross-bar array
论文作者
论文摘要
深度神经网络的最新进展要求超过数百万个参数以提高效率来处理和授权高性能计算资源。跨杆阵列体系结构被认为是有希望的深度学习体系结构之一,显示出对传统处理器的大量计算增益。为了研究建筑的可行性,我们检查了非理想性及其对性能的影响。具体而言,我们研究了由于基于电阻内存的跨杆阵列的初始化过程而导致的失败细胞的影响。与传统的内存阵列不同,单个内存元素不能重新路由,因此可能对模型准确性产生关键的影响。我们将可能的故障分类,并提出硬件实现,以最大程度地减少灾难性失败。这种硬件优化范围界定了失败的单元格的可能逻辑值,并为我们提供了通过离线训练来弥补准确性丧失的机会。通过在训练过程中引入随机重量缺陷,我们表明该模型在设备初始化故障上变得更有弹性,因此,由于设备故障而降低了推理性能的降低。我们的研究阐明了硬件和软件合作程序,以应对跨杆阵列中潜在的灾难性故障。
Recent advances in deep neural network demand more than millions of parameters to handle and mandate the high-performance computing resources with improved efficiency. The cross-bar array architecture has been considered as one of the promising deep learning architectures that shows a significant computing gain over the conventional processors. To investigate the feasibility of the architecture, we examine non-idealities and their impact on the performance. Specifically, we study the impact of failed cells due to the initialization process of the resistive memory based cross-bar array. Unlike the conventional memory array, individual memory elements cannot be rerouted and, thus, may have a critical impact on model accuracy. We categorize the possible failures and propose hardware implementation that minimizes catastrophic failures. Such hardware optimization bounds the possible logical value of the failed cells and gives us opportunities to compensate for the loss of accuracy via off-line training. By introducing the random weight defects during the training, we show that the model becomes more resilient on the device initialization failures, therefore, less prone to degrade the inference performance due to the failed devices. Our study sheds light on the hardware and software co-optimization procedure to cope with potentially catastrophic failures in the cross-bar array.