Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks

被引：2

作者：

Tai, Yu-Shan ^{[1
]}

Chang, Cheng-Yang ^{[1
]}

Teng, Chieh-Fang ^{[2
]}

Chen, Yi-Ta ^{[1
]}

Wu, An-Yeu ^{[1
]}

机构：

[1] Natl Taiwan Univ, GIEE, Taipei 10617, Taiwan

[2] Mediatek, CAI2, Hsinchu 300, Taiwan

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2023年 / 42卷 / 11期

关键词：

Quantization (signal); Dimensionality reduction; Optimization; Principal component analysis; Memory management; Measurement; Discrete cosine transforms; Activation compression (AC); convolutional neural network (CNN); dimension reduction (DR); mixed-precision (MP) quantization;

D O I：

10.1109/TCAD.2023.3248503

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, deep convolutional neural networks (CNNs) have achieved eye-catching results in various applications. However, intensive memory access of activations introduces considerable energy consumption, resulting in a great challenge for deploying CNNs on resource-constrained edge devices. Existing research utilizes dimension reduction (DR) and mixed-precision (MP) quantization separately to reduce computational complexity without paying attention to their interaction. Such naive concatenation of different compression strategies ends up with suboptimal performance. To develop a comprehensive compression framework, we propose an optimization system by jointly considering DR and MP quantization, which is enabled by independent groupwise learnable MP schemes. Group partitioning is guided by a well-designed automatic group partition mechanism that can distinguish compression priorities among channels, and it can deal with the tradeoff between model accuracy and compressibility. Moreover, to preserve model accuracy under low bit-width quantization, we propose a dynamic bit-width searching technique to enable continuous bit-width reduction. Our experimental results show that the proposed system reaches 69.03%/70.73% with average 2.16/2.61 bits per value on Resnet18/MobileNetV2, while introducing only approximately 1% accuracy loss of the uncompressed full-precision models. Compared with individual activation compression schemes, the proposed joint optimization system reduces 55%/9% (-2.62/-0.27 bits) memory access of DR and 55%/63% (-2.60/-4.52 bits) memory access of MP quantization, respectively, on Resnet18/ MobileNetV2 with comparable or even higher accuracy.

引用

页码：4025 / 4037

页数：13

共 50 条

[1] EVOLUTIONARY QUANTIZATION OF NEURAL NETWORKS WITH MIXED-PRECISION
Liu, Zhenhua
Zhang, Xinfeng
Wang, Shanshe
Ma, Siwei
Gao, Wen
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2785 - 2789
[2] Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization
Chen, Weihan
Wang, Peisong
Cheng, Jian
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5330 - 5339
[3] Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks
Vasquez, Karina
Venkatesha, Yeshwanth
Bhattacharjee, Abhiroop
Moitra, Abhishek
Panda, Priyadarshini
PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1360 - 1365
[4] HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
Dong, Zhen
Yao, Zhewei
Gholami, Amir
Mahoney, Michael W.
Keutzer, Kurt
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 293 - 302
[5] Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates
Wang, Xuanda
Fei, Wen
Dai, Wenrui
Li, Chenglin
Zou, Junni
Xiong, Hongkai
2023 DATA COMPRESSION CONFERENCE, DCC, 2023, : 371 - 371
[6] Joint Pruning and Channel-Wise Mixed-Precision Quantization for Efficient Deep Neural Networks
Motetti, Beatrice Alessandra
Risso, Matteo
Burrello, Alessio
Macii, Enrico
Poncino, Massimo
Pagliari, Daniele Jahier
IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (11) : 2619 - 2633
[7] Mixed-precision quantization-aware training for photonic neural networks
Kirtas, Manos
Passalis, Nikolaos
Oikonomou, Athina
Moralis-Pegios, Miltos
Giamougiannis, George
Tsakyridis, Apostolos
Mourgias-Alexandris, George
Pleros, Nikolaos
Tefas, Anastasios
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (29): : 21361 - 21379
[8] Mixed-precision quantization-aware training for photonic neural networks
Manos Kirtas
Nikolaos Passalis
Athina Oikonomou
Miltos Moralis-Pegios
George Giamougiannis
Apostolos Tsakyridis
George Mourgias-Alexandris
Nikolaos Pleros
Anastasios Tefas
Neural Computing and Applications, 2023, 35 : 21361 - 21379
[9] Mixed-precision quantization for neural networks based on error limit (Invited)
Li Y.
Guo Z.
Liu K.
Sun X.
Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2022, 51 (04):
[10] Hessian-based mixed-precision quantization with transition aware training for neural networks
Huang, Zhiyong
Han, Xiao
Yu, Zhi
Zhao, Yunlan
Hou, Mingyang
Hu, Shengdong
NEURAL NETWORKS, 2025, 182

← 1 2 3 4 5 →