Mixed-Clipping Quantization for Convolutional Neural Networks
-
Graphical Abstract
-
Abstract
Quantization is the main method to compress convolutional neural networks and accelerate convolutional neural network inference.Most existing quantization methods quantize all layers to the same bit width.Mixed-precision quantization can obtain higher precision under the same compression ratio,but it is difficult to find a mixed-precision quantization strategy.To solve this problem,a mixed-clipping quantization method based on reinforcement learning is proposed.It uses reinforcement learning to search for a mixed-precision quantization strategy,and uses a mixed-clipping method to clip weight data according to the searched quantization strategy before quantization.This method further improves the accuracy of the quantized network.We extensively test this method on a diverse set of models,including ResNet18/50,Mobile-Net-V2 on ImageNet,as well as YOLOV3 on the Microsoft COCO dataset.The experimental results show that our method can achieve 2.7%and 0.3%higher Top-1 accuracy on MobileNet-V2(4 bit),as compared to the HAQ and ZeroQ method.And our method can achieve 2.6%higher mAP on YOLOV3(6 bit),as compared to per-layer quantization method.
-
-