Pse: mixed quantization framework of neural networks for efficient deployment

被引:0
|
作者
Yingqing Yang
Guanzhong Tian
Mingyuan Liu
Yihao Chen
Jun Chen
Yong Liu
Yu Pan
Longhua Ma
机构
[1] Zhejiang University,Ningbo Innovation Center
[2] Alibaba Group,State Key Laboratory of Industrial Control Technology, Institute of CyberSystems and Control
[3] Zhejiang University,State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering
[4] Zhejiang University,School of Information Science and Engineering
[5] NingboTech University,undefined
来源
Journal of Real-Time Image Processing | 2023年 / 20卷
关键词
Neural networks; Quantization; Compression; Acceleration; Data-free;
D O I
暂无
中图分类号
学科分类号
摘要
Quantizing is a promising approach to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are challenged by obtaining computation acceleration and parameter compression while maintaining excellent performance. To achieve this goal, we propose PSE, a mixed quantization framework which combines product quantization (PQ), scalar quantization (SQ), and error correction. Specifically, we first employ PQ to obtain the floating-point codebook and index matrix of the weight matrix. Then, we use SQ to quantize the codebook into integers and reconstruct an integer weight matrix. Finally, we propose an error correction algorithm to update the quantized codebook and minimize the quantization error. We extensively evaluate our proposed method on various backbones, including VGG-16, ResNet-18/50, MobileNetV2, ShuffleNetV2, EfficientNet-B3/B7, and DenseNet-201 on CIFAR-10 and ILSVRC-2012 benchmarks. The experiments demonstrate that PSE reduces computation complexity and model size with acceptable accuracy loss. For example, ResNet-18 achieves 1.8×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} acceleration ratio and 30.4×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} compression ratio with less than 1.54% accuracy loss on CIFAR-10.
引用
收藏
相关论文
共 50 条
  • [41] A Design Space Exploration Framework for Deployment of Resource-Constrained Deep Neural Networks
    Zhang, Yan
    Pan, Lei
    Berkowitz, Phillip
    Lee, Mun Wai
    Riggan, Benjamin
    Bhattacharyya, Shuvra S.
    REAL-TIME IMAGE PROCESSING AND DEEP LEARNING 2024, 2024, 13034
  • [42] Quantization in Graph Convolutional Neural Networks
    Ben Saad, Leila
    Beferull-Lozano, Baltasar
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1855 - 1859
  • [43] SpikeConverter: An Efficient Conversion Framework Zipping the Gap between Artificial Neural Networks and Spiking Neural Networks
    Liu, Fangxin
    Zhao, Wenbo
    Chen, Yongbiao
    Wang, Zongwu
    Jiang, Li
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1692 - 1701
  • [44] Robust Quantization of Deep Neural Networks
    Kim, Youngseok
    Lee, Junyeol
    Kim, Younghoon
    Seo, Jiwon
    PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION (CC '20), 2020, : 74 - 84
  • [45] Towards Optimal Quantization of Neural Networks
    Chatterjee, Avhishek
    Varshney, Lav R.
    2017 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2017, : 1162 - 1166
  • [46] THE EFFECTS OF QUANTIZATION ON MULTILAYER NEURAL NETWORKS
    DUNDAR, G
    ROSE, K
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 1995, 6 (06): : 1446 - 1451
  • [47] Activations Quantization for Compact Neural Networks
    Wei, Yadong
    Zhao, Zhixu
    Taoyang
    Zhen, Zishuo
    Hao, Zeyu
    Ren, Pengju
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [48] An encoder for vector quantization neural networks
    Ancona, F
    Rovetta, S
    Zunino, R
    40TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1 AND 2, 1998, : 1286 - 1289
  • [49] Successive Log Quantization for Cost-Efficient Neural Networks Using Stochastic Computing
    Lee, Sugil
    Sim, Hyeonuk
    Choi, Jooyeon
    Lee, Jongeun
    PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
  • [50] Quantune: Post-training quantization of convolutional neural networks using extreme gradient boosting for fast deployment
    Lee, Jemin
    Yu, Misun
    Kwon, Yongin
    Kim, Taeho
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 132 : 124 - 135