高级检索

用于深度学习训练加速的自适应框架设计

Template-Based Adaptive Training Acceleration Framework for Deep Learning Algorithms

  • 摘要: 用FPGA加速深度学习算法的训练过程通常需要较长的开发周期和丰富的硬件设计经验.为了应对这一挑战,设计了一种基于自适应模板技术的深度学习算法训练加速框架,在应用规模、并行调度策略、资源使用和功能扩展上进行了深入的研究并提出了相应的优化策略.采用CPU-FPGA异构加速模板技术,提出了自适应的上层模型编译框架实现与不同硬件加速资源的适配.这种基于定制模板的软硬件协同设计可以很好地适配不同的FPGA芯片并支持算法的快速迭代.用图神经网络算法数据进行加速对比实验,实现了与CPU相比7~41倍的速度提升.

     

    Abstract: Field-programmable gate array(FPGA)is usually used to accelerate the training phase of deep learning algorithms,but it usually requires a long development cycle and rich hardware design expertise for satisfied exe-cution performance.In order to deal with this challenge,an adaptive acceleration framework for deep learning algorithm is proposed in this paper.We investigate the application scale,parallel scheduling strategy,resource usage and the scalability of functionality.With the CPU-FPGA heterogeneous acceleration template based tech-nology,an adaptive model compiler is proposed to customize the accelerator based on the algorithm’s complexity and hardware resources available.The proposed hardware and software co-design framework can effectively adapt to different FPGA hardware resources and support the fast evolution of deep learning algorithms.Taking the graph neural network as an example,it can obtain 7~41x performance improvements compared to the general purpose CPU platform.

     

/

返回文章
返回