面向高可靠汽车电子系统的低延时异构多核并行差错检测方法
Heterogeneous Architecturally Parallel Error Detection with Low Error Detection Latency for Highly Reliable Automotive Electronic Systems
-
摘要: 与业界常用的双核锁步方法相比, 异构并行差错检测技术以较小的面积开销实现接近的差错覆盖率, 但是会增加差错检测延时并影响主核的性能. 针对差错检测不及时带来的潜在安全风险, 提出一种低延时的异构并行差错检测方法. 首先通过复制寄存器时暂停物理寄存器释放的策略降低复制寄存器对主核性能的影响; 然后利用主核控制流指导检查核取指, 并基于预测检查核运行时间来划分程序段, 以提升差错检测的性能, 使得最大差错检测延时可控. 使用1个开源香山处理器核作为主核, 16个开源Rocket处理器作为检查核进行了方法实现, 采用基准程序评估的实验结果表明, 所提方法能够以50%逻辑开销和22%存储开销实现差错检测, 小于双核锁步接近100%的面积开销. 同时, 在主核上的平均性能开销小于1%, 且能将差错检测延迟控制在2000个时钟周期以内. 此外, 与原有分支预测策略相比, 检查核的平均性能提升了14.9%.
关键词: 差错检测; 容错; 可靠性中图法分类号: TP391. 41 DOI: 10.3724/SP.J.1089.2023.19937Abstract: Compared to the Dual-Core Lock-Step technique commonly used in industry, heterogeneous parallel error detection techniques using heterogeneous cores could achieve similar error coverage with smaller area overhead, at the cost of worse error detection latency and affect the performance degradation of the main core. To avoid potential security safety risks caused by errors not detected in time, a low-latency heterogeneous architecturally parallel error detection method is proposed. First, the impact on the main core’s performance is reduced by stalling the release of physical registers while copying data of the registers. Second, to improve the performance of checker cores, the main core's control flow is used to guide the instruction fetch of the checker cores, and the program segments are divided by predicting their running time in checker cores so that the maximum error detection latency can be controlled. The proposed method was implemented using the open-source XiangShan processor as the main core, and 16 Rocket processors as the checker cores. The experimental results on benchmark programs show that, error detection can be efficiently achieved with 50% logic area overhead and 22% storage area overhead, which is significantly less than the nearly 100% area overhead of the dual-core lockstep technique, while the average performance overhead on the main core is less than 1%, and the error detection latency can be effectively controlled within 2000 clock cycles. Moreover, the average performance of the checker cores has been improved by 14.9% in comparison to the original branch prediction strategy.