Abstract:
The huge parameter scale and data access of existing CNN models, as well as the diversity of computing modes of different CNN models or different layers in the same model, lead to the low overall computing efficiency of computing systems that can only support a single or fixed mapping mode and on-chip data cache. Using the strategy of hardware and software coordination design, we combine the adaptive memory access optimization mechanism based on deep reinforcement learning with the dynamic partition method of on-chip elastic cache, and automatically searches the optimal cyclic scheduling strategy for different CNN operation layers according to the relevant parameters of the storage structure in the reconfigurable CNN processor. By designing the on-chip interconnection structure, address mapping logic and dynamic partition method, elastic storage can dynamically partition the address mapping space according to different scheduling strategies during accelerator operation. In the CNN processor architecture based on Eyeriss and TPU architectures, compared with the optimal scheduling strategy, the proposed scheduling strategy and on-chip cache partitioning method can improve the energy efficiency of the two reconfigurable CNN processors by almost 3 times and 4 times respectively. Moreover, compared with the fixed capacity dual cache structure, the elastic storage partitioning method proposed in this paper can reduce the power consumption by 30.28% and 18.43%, respectively, using the same scheduling strategy. Compared with the latest research based on FPGA platform, the reconfigurable CNN processor in this paper can improve the computational efficiency and computational energy efficiency by almost 10 times and 2 times respectively.