熊猫：基于DNA的DNA组件加速撞车加速

论文标题

熊猫：基于DNA的DNA组件加速撞车加速

PANDA: Processing-in-MRAM Accelerated De Bruijn Graph based DNA Assembly

论文作者

Angizi, Shaahin, Fahmi, Naima Ahmed, Zhang, Wei, Fan, Deliang

论文摘要

通过扩大von-Neumann计算体系结构中数据处理速度和数据通信速度之间的差距的刺激，一些生物信息学应用程序利用了存储中的处理（PIM）平台的计算能力。但是，在处理此类复杂的应用程序以寻求批量比较或加法操作的复杂应用程序时，PIMS的性能会减少。在这项工作中，我们提出了一种基于优化且适合硬件友好的基因组组装算法的高效加速DNA组装平台，名为Panda。熊猫能够从全对重叠组装大规模的DNA序列数据集。我们首先设计将MRAM作为计算内存的熊猫平台，并将其转换为基因组组件的有效处理单元。熊猫不仅可以执行有效的散装位X（N）或基于基于基因组组装任务所需的比较/加法操作，而且可以执行MRAM芯片内的2-/3输入逻辑操作。然后，我们为PANDA开发了一种高度平行且逐步的硬件友好的DNA组装算法，该算法仅需要开发的内存逻辑操作。然后，该平台配置了一种新颖的数据分区和映射技术，该技术提供了本地存储和处理，以充分利用算法级的并行性。跨层模拟结果表明，与CPU相比，熊猫分别将运行时间和功率降低了18和11。此外，在最近的MRAM平台上可以获得最高2-4X的加速，以执行相同的任务。

Spurred by widening gap between data processing speed and data communication speed in Von-Neumann computing architectures, some bioinformatic applications have harnessed the computational power of Processing-in-Memory (PIM) platforms. However, the performance of PIMs unavoidably diminishes when dealing with such complex applications seeking bulk bit-wise comparison or addition operations. In this work, we present an efficient Processing-in-MRAM Accelerated De Bruijn Graph based DNA Assembly platform named PANDA based on an optimized and hardware-friendly genome assembly algorithm. PANDA is able to assemble large-scale DNA sequence data-set from all-pair overlaps. We first design PANDA platform that exploits MRAM as a computational memory and converts it to a potent processing unit for genome assembly. PANDA can execute not only efficient bulk bit-wise X(N)OR-based comparison/addition operations heavily required for the genome assembly task but a full-set of 2-/3-input logic operations inside MRAM chip. We then develop a highly parallel and step-by-step hardware-friendly DNA assembly algorithm for PANDA that only requires the developed in-memory logic operations. The platform is then configured with a novel data partitioning and mapping technique that provides local storage and processing to fully utilize the algorithm-level's parallelism. The cross-layer simulation results demonstrate that PANDA reduces the run time and power, respectively, by a factor of 18 and 11 compared with CPU. Besides, speed-ups of up-to 2-4x can be obtained over recent processing-in-MRAM platforms to perform the same task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题