Pse: mixed quantization framework of neural networks for efficient deployment

被引:0
|
作者
Yingqing Yang
Guanzhong Tian
Mingyuan Liu
Yihao Chen
Jun Chen
Yong Liu
Yu Pan
Longhua Ma
机构
[1] Zhejiang University,Ningbo Innovation Center
[2] Alibaba Group,State Key Laboratory of Industrial Control Technology, Institute of CyberSystems and Control
[3] Zhejiang University,State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering
[4] Zhejiang University,School of Information Science and Engineering
[5] NingboTech University,undefined
来源
Journal of Real-Time Image Processing | 2023年 / 20卷
关键词
Neural networks; Quantization; Compression; Acceleration; Data-free;
D O I
暂无
中图分类号
学科分类号
摘要
Quantizing is a promising approach to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are challenged by obtaining computation acceleration and parameter compression while maintaining excellent performance. To achieve this goal, we propose PSE, a mixed quantization framework which combines product quantization (PQ), scalar quantization (SQ), and error correction. Specifically, we first employ PQ to obtain the floating-point codebook and index matrix of the weight matrix. Then, we use SQ to quantize the codebook into integers and reconstruct an integer weight matrix. Finally, we propose an error correction algorithm to update the quantized codebook and minimize the quantization error. We extensively evaluate our proposed method on various backbones, including VGG-16, ResNet-18/50, MobileNetV2, ShuffleNetV2, EfficientNet-B3/B7, and DenseNet-201 on CIFAR-10 and ILSVRC-2012 benchmarks. The experiments demonstrate that PSE reduces computation complexity and model size with acceptable accuracy loss. For example, ResNet-18 achieves 1.8×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} acceleration ratio and 30.4×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} compression ratio with less than 1.54% accuracy loss on CIFAR-10.
引用
收藏
相关论文
共 50 条
  • [21] Efficient Deployment of Spiking Neural Networks on SpiNNaker Neuromorphic Platform
    Galanis, Ioannis
    Anagnostopoulos, Iraklis
    Nguyen, Chinh
    Bares, Guillermo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (06) : 1937 - 1941
  • [22] HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
    Dong, Zhen
    Yao, Zhewei
    Gholami, Amir
    Mahoney, Michael W.
    Keutzer, Kurt
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 293 - 302
  • [23] Regularized Training Framework for Combining Pruning and Quantization to Compress Neural Networks
    Ding, Qimin
    Zhang, Ruonan
    Jiang, Yi
    Zhai, Daosen
    Li, Bin
    2019 11TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2019,
  • [24] Efficient Quantization for Neural Networks with Binary Weights and Low Bitwidth Activations
    Huang, Kun
    Ni, Bingbing
    Yang, Xiaokang
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3854 - 3861
  • [25] Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization
    Chen, Weihan
    Wang, Peisong
    Cheng, Jian
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5330 - 5339
  • [26] Mixed-precision quantization for neural networks based on error limit (Invited)
    Li Y.
    Guo Z.
    Liu K.
    Sun X.
    Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2022, 51 (04):
  • [27] Mixed-precision quantization-aware training for photonic neural networks
    Kirtas, Manos
    Passalis, Nikolaos
    Oikonomou, Athina
    Moralis-Pegios, Miltos
    Giamougiannis, George
    Tsakyridis, Apostolos
    Mourgias-Alexandris, George
    Pleros, Nikolaos
    Tefas, Anastasios
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (29): : 21361 - 21379
  • [28] Mixed-precision quantization-aware training for photonic neural networks
    Manos Kirtas
    Nikolaos Passalis
    Athina Oikonomou
    Miltos Moralis-Pegios
    George Giamougiannis
    Apostolos Tsakyridis
    George Mourgias-Alexandris
    Nikolaos Pleros
    Anastasios Tefas
    Neural Computing and Applications, 2023, 35 : 21361 - 21379
  • [29] aHCQ: Adaptive Hierarchical Clustering Based Quantization Framework for Deep Neural Networks
    Hu, Jiaxin
    Rao, Weixiong
    Zhao, Qinpei
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 207 - 218
  • [30] Vector quantization of neural networks
    Chu, WC
    Bose, NK
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 1998, 9 (06): : 1235 - 1245