高级检索

卷积神经网络混合截断量化

Mixed-Clipping Quantization for Convolutional Neural Networks

  • 摘要: 量化是压缩卷积神经网络、加速卷积神经网络推理的主要方法.现有的量化方法大多将所有层量化至相同的位宽,混合精度量化则可以在相同的压缩比下获得更高的准确率,但寻找混合精度量化策略是很困难的.为解决这种问题,提出了一种基于强化学习的卷积神经网络混合截断量化方法,使用强化学习的方法搜索混合精度量化策略,并根据搜索得到的量化策略混合截断权重数据后再进行量化,进一步提高了量化后网络的准确率.在ImageNet数据集上测试了ResNet18/50以及MobileNet-V2使用此方法量化前后的Top-1准确率,在COCO数据集上测试了YOLOV3网络量化前后的mAP.与HAQ, ZeroQ相比, MobileNet-V2网络量化至4位的Top-1准确率分别提高了2.7%和0.3%;与分层量化相比, YOLOV3网络量化至6位的mAP提高了2.6%.

     

    Abstract: Quantization is the main method to compress convolutional neural networks and accelerate convolutional neural network inference.Most existing quantization methods quantize all layers to the same bit width.Mixed-precision quantization can obtain higher precision under the same compression ratio,but it is difficult to find a mixed-precision quantization strategy.To solve this problem,a mixed-clipping quantization method based on reinforcement learning is proposed.It uses reinforcement learning to search for a mixed-precision quantization strategy,and uses a mixed-clipping method to clip weight data according to the searched quantization strategy before quantization.This method further improves the accuracy of the quantized network.We extensively test this method on a diverse set of models,including ResNet18/50,Mobile-Net-V2 on ImageNet,as well as YOLOV3 on the Microsoft COCO dataset.The experimental results show that our method can achieve 2.7%and 0.3%higher Top-1 accuracy on MobileNet-V2(4 bit),as compared to the HAQ and ZeroQ method.And our method can achieve 2.6%higher mAP on YOLOV3(6 bit),as compared to per-layer quantization method.

     

/

返回文章
返回