Pse: mixed quantization framework of neural networks for efficient deployment

被引：0

作者：

Yingqing Yang

Guanzhong Tian

Mingyuan Liu

Yihao Chen

Jun Chen

Yong Liu

Yu Pan

Longhua Ma

机构：

[1] Zhejiang University,Ningbo Innovation Center

[2] Alibaba Group,State Key Laboratory of Industrial Control Technology, Institute of CyberSystems and Control

[3] Zhejiang University,State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering

[4] Zhejiang University,School of Information Science and Engineering

[5] NingboTech University,undefined

来源：

Journal of Real-Time Image Processing | 2023年 / 20卷

关键词：

Neural networks; Quantization; Compression; Acceleration; Data-free;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Quantizing is a promising approach to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are challenged by obtaining computation acceleration and parameter compression while maintaining excellent performance. To achieve this goal, we propose PSE, a mixed quantization framework which combines product quantization (PQ), scalar quantization (SQ), and error correction. Specifically, we first employ PQ to obtain the floating-point codebook and index matrix of the weight matrix. Then, we use SQ to quantize the codebook into integers and reconstruct an integer weight matrix. Finally, we propose an error correction algorithm to update the quantized codebook and minimize the quantization error. We extensively evaluate our proposed method on various backbones, including VGG-16, ResNet-18/50, MobileNetV2, ShuffleNetV2, EfficientNet-B3/B7, and DenseNet-201 on CIFAR-10 and ILSVRC-2012 benchmarks. The experiments demonstrate that PSE reduces computation complexity and model size with acceptable accuracy loss. For example, ResNet-18 achieves 1.8×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} acceleration ratio and 30.4×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} compression ratio with less than 1.54% accuracy loss on CIFAR-10.

引用

共 50 条

[1] Pse: mixed quantization framework of neural networks for efficient deployment
Yang, Yingqing
Tian, Guanzhong
Liu, Mingyuan
Chen, Yihao
Chen, Jun
Liu, Yong
Pan, Yu
Ma, Longhua
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2023, 20 (06)
[2] Quantization and Deployment of Deep Neural Networks on Microcontrollers
Novac, Pierre-Emmanuel
Boukli Hacene, Ghouthi
Pegatoquet, Alain
Miramond, Benoit
Gripon, Vincent
SENSORS, 2021, 21 (09)
[3] Quantization Framework for Fast Spiking Neural Networks
Li, Chen
Ma, Lei
Furber, Steve
FRONTIERS IN NEUROSCIENCE, 2022, 16
[4] Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks
Vasquez, Karina
Venkatesha, Yeshwanth
Bhattacharjee, Abhiroop
Moitra, Abhishek
Panda, Priyadarshini
PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1360 - 1365
[5] Flexible Quantization for Efficient Convolutional Neural Networks
Zacchigna, Federico Giordano
Lew, Sergio
Lutenberg, Ariel
ELECTRONICS, 2024, 13 (10)
[6] An efficient segmented quantization for graph neural networks
Yue Dai
Xulong Tang
Youtao Zhang
CCF Transactions on High Performance Computing, 2022, 4 : 461 - 473
[7] An efficient segmented quantization for graph neural networks
Dai, Yue
Tang, Xulong
Zhang, Youtao
CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2022, 4 (04) : 461 - 473
[8] Bit Efficient Quantization for Deep Neural Networks
Nayak, Prateeth
Zhang, David
Chai, Sek
FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 52 - 56
[9] EVOLUTIONARY QUANTIZATION OF NEURAL NETWORKS WITH MIXED-PRECISION
Liu, Zhenhua
Zhang, Xinfeng
Wang, Shanshe
Ma, Siwei
Gao, Wen
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2785 - 2789
[10] Mixed-Clipping Quantization for Convolutional Neural Networks
Huang Z.
Du H.
Chang L.
Chang, Libo (changlibo@xupt.edu.cn), 1600, Institute of Computing Technology (33): : 553 - 559

← 1 2 3 4 5 →