Pse: mixed quantization framework of neural networks for efficient deployment

被引:0
|
作者
Yingqing Yang
Guanzhong Tian
Mingyuan Liu
Yihao Chen
Jun Chen
Yong Liu
Yu Pan
Longhua Ma
机构
[1] Zhejiang University,Ningbo Innovation Center
[2] Alibaba Group,State Key Laboratory of Industrial Control Technology, Institute of CyberSystems and Control
[3] Zhejiang University,State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering
[4] Zhejiang University,School of Information Science and Engineering
[5] NingboTech University,undefined
来源
关键词
Neural networks; Quantization; Compression; Acceleration; Data-free;
D O I
暂无
中图分类号
学科分类号
摘要
Quantizing is a promising approach to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are challenged by obtaining computation acceleration and parameter compression while maintaining excellent performance. To achieve this goal, we propose PSE, a mixed quantization framework which combines product quantization (PQ), scalar quantization (SQ), and error correction. Specifically, we first employ PQ to obtain the floating-point codebook and index matrix of the weight matrix. Then, we use SQ to quantize the codebook into integers and reconstruct an integer weight matrix. Finally, we propose an error correction algorithm to update the quantized codebook and minimize the quantization error. We extensively evaluate our proposed method on various backbones, including VGG-16, ResNet-18/50, MobileNetV2, ShuffleNetV2, EfficientNet-B3/B7, and DenseNet-201 on CIFAR-10 and ILSVRC-2012 benchmarks. The experiments demonstrate that PSE reduces computation complexity and model size with acceptable accuracy loss. For example, ResNet-18 achieves 1.8×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} acceleration ratio and 30.4×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} compression ratio with less than 1.54% accuracy loss on CIFAR-10.
引用
收藏
相关论文
共 50 条
  • [1] Pse: mixed quantization framework of neural networks for efficient deployment
    Yang, Yingqing
    Tian, Guanzhong
    Liu, Mingyuan
    Chen, Yihao
    Chen, Jun
    Liu, Yong
    Pan, Yu
    Ma, Longhua
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2023, 20 (06)
  • [2] Quantization and Deployment of Deep Neural Networks on Microcontrollers
    Novac, Pierre-Emmanuel
    Boukli Hacene, Ghouthi
    Pegatoquet, Alain
    Miramond, Benoit
    Gripon, Vincent
    SENSORS, 2021, 21 (09)
  • [3] Quantization Framework for Fast Spiking Neural Networks
    Li, Chen
    Ma, Lei
    Furber, Steve
    FRONTIERS IN NEUROSCIENCE, 2022, 16
  • [4] Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks
    Vasquez, Karina
    Venkatesha, Yeshwanth
    Bhattacharjee, Abhiroop
    Moitra, Abhishek
    Panda, Priyadarshini
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1360 - 1365
  • [5] Flexible Quantization for Efficient Convolutional Neural Networks
    Zacchigna, Federico Giordano
    Lew, Sergio
    Lutenberg, Ariel
    ELECTRONICS, 2024, 13 (10)
  • [6] An efficient segmented quantization for graph neural networks
    Yue Dai
    Xulong Tang
    Youtao Zhang
    CCF Transactions on High Performance Computing, 2022, 4 : 461 - 473
  • [7] An efficient segmented quantization for graph neural networks
    Dai, Yue
    Tang, Xulong
    Zhang, Youtao
    CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2022, 4 (04) : 461 - 473
  • [8] Bit Efficient Quantization for Deep Neural Networks
    Nayak, Prateeth
    Zhang, David
    Chai, Sek
    FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 52 - 56
  • [9] EVOLUTIONARY QUANTIZATION OF NEURAL NETWORKS WITH MIXED-PRECISION
    Liu, Zhenhua
    Zhang, Xinfeng
    Wang, Shanshe
    Ma, Siwei
    Gao, Wen
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2785 - 2789
  • [10] Mixed-Clipping Quantization for Convolutional Neural Networks
    Huang Z.
    Du H.
    Chang L.
    Chang, Libo (changlibo@xupt.edu.cn), 1600, Institute of Computing Technology (33): : 553 - 559