高级检索
常立博, 武丹妮, 杜慧敏, 张盛兵, 郝鹏, 蔡秀霞. 可重构CNN处理器的高能效自适应映射策略研究[J]. 计算机辅助设计与图形学学报. DOI: 10.3724/SP.J.1089.null.2023-00433
引用本文: 常立博, 武丹妮, 杜慧敏, 张盛兵, 郝鹏, 蔡秀霞. 可重构CNN处理器的高能效自适应映射策略研究[J]. 计算机辅助设计与图形学学报. DOI: 10.3724/SP.J.1089.null.2023-00433
Libo Chang, Danni Wu, Huimin Du, Shengbing Zhang, Peng Hao, Xiuxia Cai. Adaptive Mapping for Data Orchestration in Reconfigurable CNN Processors via Deep Reinforcement Learning[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.null.2023-00433
Citation: Libo Chang, Danni Wu, Huimin Du, Shengbing Zhang, Peng Hao, Xiuxia Cai. Adaptive Mapping for Data Orchestration in Reconfigurable CNN Processors via Deep Reinforcement Learning[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.null.2023-00433

可重构CNN处理器的高能效自适应映射策略研究

Adaptive Mapping for Data Orchestration in Reconfigurable CNN Processors via Deep Reinforcement Learning

  • 摘要: 针对因CNN模型巨大的参数规模和数据访问量, 及不同CNN模型或同一模型中不同层的计算模式存在多样性, 导致仅可支持单一或固定映射方式及片上数据缓存的计算系统整体能效较低的问题, 提出软硬件协同设计策略. 将基于深度强化学习的自适应访存优化机制与片上弹性缓存动态划分方法结合, 根据可重构CNN处理器中存储结构相关参数, 针对不同CNN运算层自动搜索最优的循环调度策略; 并通过设计可重构片上互联结构、地址映射逻辑以及动态存储调度方法, 使片上弹性缓存可根据不同调度策略动态划分地址映射空间. 在基于Eyeriss和TPU的CNN处理器构架上, 与最近提出的调度策略相比, 采用所提方法可分别将2种可重构CNN处理器的能效提升约3倍和4倍; 并且, 采用相同的调度策略, 与固定容量双缓存结构相比, 文中的弹性存储划分方法可分别将功耗减少30.28%和18.43%. 与基于FPGA平台的较新研究相比, 文中可重构CNN处理器将计算效率和计算能效分别提高了约10倍和2倍.

     

    Abstract: The huge parameter scale and data access of existing CNN models, as well as the diversity of computing modes of different CNN models or different layers in the same model, lead to the low overall computing efficiency of computing systems that can only support a single or fixed mapping mode and on-chip data cache. Using the strategy of hardware and software coordination design, we combine the adaptive memory access optimization mechanism based on deep reinforcement learning with the dynamic partition method of on-chip elastic cache, and automatically searches the optimal cyclic scheduling strategy for different CNN operation layers according to the relevant parameters of the storage structure in the reconfigurable CNN processor. By designing the on-chip interconnection structure, address mapping logic and dynamic partition method, elastic storage can dynamically partition the address mapping space according to different scheduling strategies during accelerator operation. In the CNN processor architecture based on Eyeriss and TPU architectures, compared with the optimal scheduling strategy, the proposed scheduling strategy and on-chip cache partitioning method can improve the energy efficiency of the two reconfigurable CNN processors by almost 3 times and 4 times respectively. Moreover, compared with the fixed capacity dual cache structure, the elastic storage partitioning method proposed in this paper can reduce the power consumption by 30.28% and 18.43%, respectively, using the same scheduling strategy. Compared with the latest research based on FPGA platform, the reconfigurable CNN processor in this paper can improve the computational efficiency and computational energy efficiency by almost 10 times and 2 times respectively.

     

/

返回文章
返回