面向卷积神经网络加速器吞吐量优化的FPGA自动化设计方法

陆维娜; 胡瑜; 叶靖; 李晓维

doi:10.3724/SP.J.1089.2018.17039

面向卷积神经网络加速器吞吐量优化的FPGA自动化设计方法

Throughput-oriented Automatic Design of FPGA Accelerator for Convolutional Neural Networks

摘要

摘要: 针对卷积神经网络FPGA加速器的资源分配与频率设置欠佳导致吞吐量受限的问题,提出一种面向吞吐量优化的自动化设计方法.首先将加速器的设计分为并行策略和频率设计,提出总体设计流程;然后将设计空间探索建模为线段分割问题,采用遗传算法及贪心算法求解;最后根据求解出的并行策略完成加速器的结构设计,根据求解出的预期运行频率对加速器的布局布线优化,使实际频率可以达到预期.对AlexNet及VGG-16模型在目标器件AlteraDE5a-Net的设计实验结果表明,文中方法能有效地提升资源使用效率并给出合理频率设置;相比于其他卷积神经网络FPGA加速器设计方法,该方法可提升AlexNet和VGG-16的吞吐量82.95%和66.19%.

Abstract: The throughput of FPGA accelerator for convolutional neural network(CNN)is determined by parallel strategies and frequency.A throughput-oriented automatic design method is proposed in this paper.Firstly,an automatic design flow is proposed for the parallel strategy and the frequency of the accelerator.Then the design space exploration is formulated as a segment partition problem and is solved by a genetic and greedy algorithm.Finally,the FPGA accelerator design is implemented with the explored parallel strategy.The frequency of the accelerator is considered at the placement and routing stage to meet the design expectation.Two typical CNN models AlexNet and VGG-16 were implemented on the Altera DE5a-Net board by using the proposed method.The experimental results demonstrated that,the throughputs of AlexNet and VGG-16 could be improved by 82.95%and 66.19%respectively,in comparison with the state-of-the-art FPGA accelerators.

HTML全文

参考文献(0)

施引文献

资源附件(0)