Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks

被引：2

作者：

Tai, Yu-Shan ^{[1
]}

Chang, Cheng-Yang ^{[1
]}

Teng, Chieh-Fang ^{[2
]}

Chen, Yi-Ta ^{[1
]}

Wu, An-Yeu ^{[1
]}

机构：

[1] Natl Taiwan Univ, GIEE, Taipei 10617, Taiwan

[2] Mediatek, CAI2, Hsinchu 300, Taiwan

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2023年 / 42卷 / 11期

关键词：

Quantization (signal); Dimensionality reduction; Optimization; Principal component analysis; Memory management; Measurement; Discrete cosine transforms; Activation compression (AC); convolutional neural network (CNN); dimension reduction (DR); mixed-precision (MP) quantization;

D O I：

10.1109/TCAD.2023.3248503

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, deep convolutional neural networks (CNNs) have achieved eye-catching results in various applications. However, intensive memory access of activations introduces considerable energy consumption, resulting in a great challenge for deploying CNNs on resource-constrained edge devices. Existing research utilizes dimension reduction (DR) and mixed-precision (MP) quantization separately to reduce computational complexity without paying attention to their interaction. Such naive concatenation of different compression strategies ends up with suboptimal performance. To develop a comprehensive compression framework, we propose an optimization system by jointly considering DR and MP quantization, which is enabled by independent groupwise learnable MP schemes. Group partitioning is guided by a well-designed automatic group partition mechanism that can distinguish compression priorities among channels, and it can deal with the tradeoff between model accuracy and compressibility. Moreover, to preserve model accuracy under low bit-width quantization, we propose a dynamic bit-width searching technique to enable continuous bit-width reduction. Our experimental results show that the proposed system reaches 69.03%/70.73% with average 2.16/2.61 bits per value on Resnet18/MobileNetV2, while introducing only approximately 1% accuracy loss of the uncompressed full-precision models. Compared with individual activation compression schemes, the proposed joint optimization system reduces 55%/9% (-2.62/-0.27 bits) memory access of DR and 55%/63% (-2.60/-4.52 bits) memory access of MP quantization, respectively, on Resnet18/ MobileNetV2 with comparable or even higher accuracy.

引用

页码：4025 / 4037

页数：13

共 50 条

[31] Hardware-Centric AutoML for Mixed-Precision Quantization
Kuan Wang
Zhijian Liu
Yujun Lin
Ji Lin
Song Han
International Journal of Computer Vision, 2020, 128 : 2035 - 2048
[32] Optimizing Information Theory Based Bitwise Bottlenecks for Efficient Mixed-Precision Activation Quantization
Zhou, Xichuan
Liu, Kui
Shi, Cong
Liu, Haijun
Liu, Ji
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3590 - 3598
[33] AMED: Automatic Mixed-Precision Quantization for Edge Devices
Kimhi, Moshe
Rozen, Tal
Mendelson, Avi
Baskin, Chaim
MATHEMATICS, 2024, 12 (12)
[34] Hierarchical Mixed-Precision Post-Training Quantization for SAR Ship Detection Networks
Wei, Hang
Wang, Zulin
Ni, Yuanhan
REMOTE SENSING, 2024, 16 (21)
[35] Mixed-Precision Network Quantization for Infrared Small Target Segmentation
Li, Boyang
Wang, Longguang
Wang, Yingqian
Wu, Tianhao
Lin, Zaiping
Li, Miao
An, Wei
Guo, Yulan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
[36] Generalizable Mixed-Precision Quantization via Attribution Rank Preservation
Wang, Ziwei
Xiao, Han
Lu, Jiwen
Zhou, Jie
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5271 - 5280
[37] Learning Generalizable Mixed-Precision Quantization via Attribution Imitation
Wang, Ziwei
Xiao, Han
Zhou, Jie
Lu, Jiwen
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5101 - 5123
[38] Evaluating the Impact of Mixed-Precision on Fault Propagation for Deep Neural Networks on GPUs
Dos Santos, Fernando Fernandes
Rech, Paolo
Kritikakou, Angeliki
Sentieys, Olivier
2022 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2022), 2022, : 327 - 327
[39] Enabling Mixed-Precision Quantized Neural Networks in Extreme-Edge Devices
Bruschi, Nazareno
Garofalo, Angelo
Conti, Francesco
Tagliavini, Giuseppe
Rossi, Davide
17TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2020 (CF 2020), 2020, : 217 - 220
[40] Patch-wise Mixed-Precision Quantization of Vision Transformer
Xiao, Junrui
Li, Zhikai
Yang, Lianwei
Gu, Qingyi
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,

← 1 2 3 4 5 →