Pse: mixed quantization framework of neural networks for efficient deployment

被引：0

作者：

Yingqing Yang

Guanzhong Tian

Mingyuan Liu

Yihao Chen

Jun Chen

Yong Liu

Yu Pan

Longhua Ma

机构：

[1] Zhejiang University,Ningbo Innovation Center

[2] Alibaba Group,State Key Laboratory of Industrial Control Technology, Institute of CyberSystems and Control

[3] Zhejiang University,State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering

[4] Zhejiang University,School of Information Science and Engineering

[5] NingboTech University,undefined

来源：

Journal of Real-Time Image Processing | 2023年 / 20卷

关键词：

Neural networks; Quantization; Compression; Acceleration; Data-free;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Quantizing is a promising approach to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are challenged by obtaining computation acceleration and parameter compression while maintaining excellent performance. To achieve this goal, we propose PSE, a mixed quantization framework which combines product quantization (PQ), scalar quantization (SQ), and error correction. Specifically, we first employ PQ to obtain the floating-point codebook and index matrix of the weight matrix. Then, we use SQ to quantize the codebook into integers and reconstruct an integer weight matrix. Finally, we propose an error correction algorithm to update the quantized codebook and minimize the quantization error. We extensively evaluate our proposed method on various backbones, including VGG-16, ResNet-18/50, MobileNetV2, ShuffleNetV2, EfficientNet-B3/B7, and DenseNet-201 on CIFAR-10 and ILSVRC-2012 benchmarks. The experiments demonstrate that PSE reduces computation complexity and model size with acceptable accuracy loss. For example, ResNet-18 achieves 1.8×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} acceleration ratio and 30.4×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} compression ratio with less than 1.54% accuracy loss on CIFAR-10.

引用

共 50 条

[41] A Design Space Exploration Framework for Deployment of Resource-Constrained Deep Neural Networks
Zhang, Yan
Pan, Lei
Berkowitz, Phillip
Lee, Mun Wai
Riggan, Benjamin
Bhattacharyya, Shuvra S.
REAL-TIME IMAGE PROCESSING AND DEEP LEARNING 2024, 2024, 13034
[42] Quantization in Graph Convolutional Neural Networks
Ben Saad, Leila
Beferull-Lozano, Baltasar
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1855 - 1859
[43] SpikeConverter: An Efficient Conversion Framework Zipping the Gap between Artificial Neural Networks and Spiking Neural Networks
Liu, Fangxin
Zhao, Wenbo
Chen, Yongbiao
Wang, Zongwu
Jiang, Li
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1692 - 1701
[44] Robust Quantization of Deep Neural Networks
Kim, Youngseok
Lee, Junyeol
Kim, Younghoon
Seo, Jiwon
PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION (CC '20), 2020, : 74 - 84
[45] Towards Optimal Quantization of Neural Networks
Chatterjee, Avhishek
Varshney, Lav R.
2017 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2017, : 1162 - 1166
[46] THE EFFECTS OF QUANTIZATION ON MULTILAYER NEURAL NETWORKS
DUNDAR, G
ROSE, K
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1995, 6 (06): : 1446 - 1451
[47] Activations Quantization for Compact Neural Networks
Wei, Yadong
Zhao, Zhixu
Taoyang
Zhen, Zishuo
Hao, Zeyu
Ren, Pengju
2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
[48] An encoder for vector quantization neural networks
Ancona, F
Rovetta, S
Zunino, R
40TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1 AND 2, 1998, : 1286 - 1289
[49] Successive Log Quantization for Cost-Efficient Neural Networks Using Stochastic Computing
Lee, Sugil
Sim, Hyeonuk
Choi, Jooyeon
Lee, Jongeun
PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
[50] Quantune: Post-training quantization of convolutional neural networks using extreme gradient boosting for fast deployment
Lee, Jemin
Yu, Misun
Kwon, Yongin
Kim, Taeho
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 132 : 124 - 135

← 1 2 3 4 5 →