Pse: mixed quantization framework of neural networks for efficient deployment

被引：0

作者：

Yingqing Yang

Guanzhong Tian

Mingyuan Liu

Yihao Chen

Jun Chen

Yong Liu

Yu Pan

Longhua Ma

机构：

[1] Zhejiang University,Ningbo Innovation Center

[2] Alibaba Group,State Key Laboratory of Industrial Control Technology, Institute of CyberSystems and Control

[3] Zhejiang University,State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering

[4] Zhejiang University,School of Information Science and Engineering

[5] NingboTech University,undefined

来源：

Journal of Real-Time Image Processing | 2023年 / 20卷

关键词：

Neural networks; Quantization; Compression; Acceleration; Data-free;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Quantizing is a promising approach to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are challenged by obtaining computation acceleration and parameter compression while maintaining excellent performance. To achieve this goal, we propose PSE, a mixed quantization framework which combines product quantization (PQ), scalar quantization (SQ), and error correction. Specifically, we first employ PQ to obtain the floating-point codebook and index matrix of the weight matrix. Then, we use SQ to quantize the codebook into integers and reconstruct an integer weight matrix. Finally, we propose an error correction algorithm to update the quantized codebook and minimize the quantization error. We extensively evaluate our proposed method on various backbones, including VGG-16, ResNet-18/50, MobileNetV2, ShuffleNetV2, EfficientNet-B3/B7, and DenseNet-201 on CIFAR-10 and ILSVRC-2012 benchmarks. The experiments demonstrate that PSE reduces computation complexity and model size with acceptable accuracy loss. For example, ResNet-18 achieves 1.8×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} acceleration ratio and 30.4×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} compression ratio with less than 1.54% accuracy loss on CIFAR-10.

引用

共 50 条

[21] Efficient Deployment of Spiking Neural Networks on SpiNNaker Neuromorphic Platform
Galanis, Ioannis
Anagnostopoulos, Iraklis
Nguyen, Chinh
Bares, Guillermo
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (06) : 1937 - 1941
[22] HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
Dong, Zhen
Yao, Zhewei
Gholami, Amir
Mahoney, Michael W.
Keutzer, Kurt
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 293 - 302
[23] Regularized Training Framework for Combining Pruning and Quantization to Compress Neural Networks
Ding, Qimin
Zhang, Ruonan
Jiang, Yi
Zhai, Daosen
Li, Bin
2019 11TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2019,
[24] Efficient Quantization for Neural Networks with Binary Weights and Low Bitwidth Activations
Huang, Kun
Ni, Bingbing
Yang, Xiaokang
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3854 - 3861
[25] Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization
Chen, Weihan
Wang, Peisong
Cheng, Jian
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5330 - 5339
[26] Mixed-precision quantization for neural networks based on error limit (Invited)
Li Y.
Guo Z.
Liu K.
Sun X.
Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2022, 51 (04):
[27] Mixed-precision quantization-aware training for photonic neural networks
Kirtas, Manos
Passalis, Nikolaos
Oikonomou, Athina
Moralis-Pegios, Miltos
Giamougiannis, George
Tsakyridis, Apostolos
Mourgias-Alexandris, George
Pleros, Nikolaos
Tefas, Anastasios
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (29): : 21361 - 21379
[28] Mixed-precision quantization-aware training for photonic neural networks
Manos Kirtas
Nikolaos Passalis
Athina Oikonomou
Miltos Moralis-Pegios
George Giamougiannis
Apostolos Tsakyridis
George Mourgias-Alexandris
Nikolaos Pleros
Anastasios Tefas
Neural Computing and Applications, 2023, 35 : 21361 - 21379
[29] aHCQ: Adaptive Hierarchical Clustering Based Quantization Framework for Deep Neural Networks
Hu, Jiaxin
Rao, Weixiong
Zhao, Qinpei
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 207 - 218
[30] Vector quantization of neural networks
Chu, WC
Bose, NK
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1998, 9 (06): : 1235 - 1245

← 1 2 3 4 5 →