高级检索
曾成龙, 刘强. 面向嵌入式FPGA的高性能卷积神经网络加速器设计[J]. 计算机辅助设计与图形学学报, 2019, 31(9): 1645-1652. DOI: 10.3724/SP.J.1089.2019.17423
引用本文: 曾成龙, 刘强. 面向嵌入式FPGA的高性能卷积神经网络加速器设计[J]. 计算机辅助设计与图形学学报, 2019, 31(9): 1645-1652. DOI: 10.3724/SP.J.1089.2019.17423
Zeng Chenglong, Liu Qiang. Design of High Performance Convolutional Neural Network Accelerator for Embedded FPGA[J]. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(9): 1645-1652. DOI: 10.3724/SP.J.1089.2019.17423
Citation: Zeng Chenglong, Liu Qiang. Design of High Performance Convolutional Neural Network Accelerator for Embedded FPGA[J]. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(9): 1645-1652. DOI: 10.3724/SP.J.1089.2019.17423

面向嵌入式FPGA的高性能卷积神经网络加速器设计

Design of High Performance Convolutional Neural Network Accelerator for Embedded FPGA

  • 摘要: 针对基于嵌入式现场可编程门阵列(FPGA)平台的卷积神经网络加速器由于资源有限导致处理速度受限的问题,提出一种高性能卷积神经网络加速器.首先根据卷积神经网络和嵌入式FPGA平台的特点,设计软硬件协同操作架构;然后在存储资源和计算资源的限制下,分别提出二维直接内存存取分块和权衡数字信号处理单元与查找表使用的优化策略;最后针对人脸检测的应用,对SSD网络模型进行优化,采用软硬件流水结构,提高人脸检测系统的整体性能.在XilinxZC706开发板上实现此加速器,实验结果表明,该加速器可达到167.5 GOPS的平均性能和81.2帧/s的人脸检测速率,其平均性能和人脸检测速率是嵌入式GPU平台TX2的1.58倍.

     

    Abstract: Convolutional neural network accelerators based on embedded FPGAs have limited processing speed due to limited resources.A high performance convolutional neural network accelerator is proposed in this paper.Firstly,according to the characteristics of convolutional neural network algorithms and embedded FPGA platforms,the software and hardware co-operation architecture is designed.Then,under the constraints of storage resources and computing resources,a 2D DMA blocking strategy and a strategy for balancing the usages of DSP and LUT are proposed.Finally,for the application of face detection,the SSD network model is optimized,and the hardware and software pipeline structure is adopted to improve the overall performance of the face detection system.The accelerator is implemented on Xilinx ZC706 board.The experimental results show that the accelerator can achieve an average performance of 167.5 GOPS and a face detection rate of 81.2 frames per second,which is 1.58 times that of the embedded GPU platform TX2.

     

/

返回文章
返回