论文标题
按需冗余分组:多核心群集的可选软校园公差
On-Demand Redundancy Grouping: Selectable Soft-Error Tolerance for a Multicore Cluster
论文作者
论文摘要
随着技术节点的收缩以及在敌对和关键环境中使用并行处理器群集的使用,例如空间,由辐射引起的运行时间故障,这是一个严重的跨切割问题,也影响了建筑设计。本文介绍了一种架构方法,可在核心级别运行可配置的软式校正耐受性,从而通过新颖的按需冗余分组(ODRG)方案增强了六核开源RISC-V群集。 ODRG允许群集作为两个容忍故障的内核或六个单独的内核进行高性能,在运行时的开销有限,可以在这些模式之间切换。 ODRG单元在三核组中增加了不到核心面积的11%,或者总计占集群面积的1%,并且显示出可忽略的时机增加,这与商业最先进的实施相比,这是2.5 $ \ tims $ \ times $ $ \ times $更快的速度。此外,当不需要冗余时,ODRG方法允许冗余核心用于独立计算,可在所选应用程序中可提高2.96美元$ \ times $。
With the shrinking of technology nodes and the use of parallel processor clusters in hostile and critical environments, such as space, run-time faults caused by radiation are a serious cross-cutting concern, also impacting architectural design. This paper introduces an architectural approach to run-time configurable soft-error tolerance at the core level, augmenting a six-core open-source RISC-V cluster with a novel On-Demand Redundancy Grouping (ODRG) scheme. ODRG allows the cluster to operate either as two fault-tolerant cores, or six individual cores for high-performance, with limited overhead to switch between these modes during run-time. The ODRG unit adds less than 11% of a core's area for a three-core group, or a total of 1% of the cluster area, and shows negligible timing increase, which compares favorably to a commercial state-of-the-art implementation, and is 2.5$\times$ faster in fault recovery re-synchronization. Furthermore, when redundancy is not necessary, the ODRG approach allows the redundant cores to be used for independent computation, allowing up to 2.96$\times$ increase in performance for selected applications.