高级检索
李佳骏, 许浩博, 王郁杰, 肖航, 王颖, 韩银和, 李晓维. 面向高能效加速器的二值化神经网络设计和训练方法[J]. 计算机辅助设计与图形学学报, 2023, 35(6): 961-969. DOI: 10.3724/SP.J.1089.2023.19461
引用本文: 李佳骏, 许浩博, 王郁杰, 肖航, 王颖, 韩银和, 李晓维. 面向高能效加速器的二值化神经网络设计和训练方法[J]. 计算机辅助设计与图形学学报, 2023, 35(6): 961-969. DOI: 10.3724/SP.J.1089.2023.19461
Li Jiajun, Xu Haobo, Wang Yujie, Xiao Hang, Wang Ying, Han Yinhe, Li Xiaowei. Design and Training of Binarized Neural Networks for Highly Efficient Accelerators[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(6): 961-969. DOI: 10.3724/SP.J.1089.2023.19461
Citation: Li Jiajun, Xu Haobo, Wang Yujie, Xiao Hang, Wang Ying, Han Yinhe, Li Xiaowei. Design and Training of Binarized Neural Networks for Highly Efficient Accelerators[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(6): 961-969. DOI: 10.3724/SP.J.1089.2023.19461

面向高能效加速器的二值化神经网络设计和训练方法

Design and Training of Binarized Neural Networks for Highly Efficient Accelerators

  • 摘要: 针对二值化神经网络加速器计算溢出和乘法器依赖的问题, 提出一套二值化神经网络设计及其训练方法.首先设计能够模拟溢出的矩阵乘法, 保证模型部署后不丢失准确率; 然后优化卷积层和激活函数, 缓解溢出总量;再设计移位批标准化层, 使二值化神经网络摆脱对乘法运算的依赖, 并有效地降低访存; 最后针对改进的模型提出基于溢出启发的协同训练框架, 确保模型训练能够收敛. 实验结果表明, 与 10 个主流的关键词激活方法相比, 在准确率无明显损失的情况下, 所提方法使片上计算规模减少超过 49.1%, 并为加速器带来至少 21.0%的速度提升.

     

    Abstract: Aiming at the problem of computation overflow and multiplier dependence on the binarized neural network accelerator, a set of design and training methods of binarized neural networks (BNN) are proposed. Firstly, an accurate simulator is designed to ensure that BNN does not lose accuracy after deployment. Secondly, the convolutional layer and activation functions of the BNN are optimized to alleviate the total amount of overflow. Thirdly, an operator named Shift-based Batch Normalization is proposed to make the BNN get rid of the dependence on multiplication and reduce memory access. Finally, for the improved BNN, a collaborative training framework based on overflow heuristics is proposed to ensure that the model training converges. The experimental results show that, compared with 10 keyword spotting methods, the accelerator reduces the amount of on-chip computation by more than 49.1% and increases the speed at least 21.0% without significant loss of accuracy.

     

/

返回文章
返回