Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks

被引：2

作者：

Tai, Yu-Shan ^{[1
]}

Chang, Cheng-Yang ^{[1
]}

Teng, Chieh-Fang ^{[2
]}

Chen, Yi-Ta ^{[1
]}

Wu, An-Yeu ^{[1
]}

机构：

[1] Natl Taiwan Univ, GIEE, Taipei 10617, Taiwan

[2] Mediatek, CAI2, Hsinchu 300, Taiwan

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2023年 / 42卷 / 11期

关键词：

Quantization (signal); Dimensionality reduction; Optimization; Principal component analysis; Memory management; Measurement; Discrete cosine transforms; Activation compression (AC); convolutional neural network (CNN); dimension reduction (DR); mixed-precision (MP) quantization;

D O I：

10.1109/TCAD.2023.3248503

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, deep convolutional neural networks (CNNs) have achieved eye-catching results in various applications. However, intensive memory access of activations introduces considerable energy consumption, resulting in a great challenge for deploying CNNs on resource-constrained edge devices. Existing research utilizes dimension reduction (DR) and mixed-precision (MP) quantization separately to reduce computational complexity without paying attention to their interaction. Such naive concatenation of different compression strategies ends up with suboptimal performance. To develop a comprehensive compression framework, we propose an optimization system by jointly considering DR and MP quantization, which is enabled by independent groupwise learnable MP schemes. Group partitioning is guided by a well-designed automatic group partition mechanism that can distinguish compression priorities among channels, and it can deal with the tradeoff between model accuracy and compressibility. Moreover, to preserve model accuracy under low bit-width quantization, we propose a dynamic bit-width searching technique to enable continuous bit-width reduction. Our experimental results show that the proposed system reaches 69.03%/70.73% with average 2.16/2.61 bits per value on Resnet18/MobileNetV2, while introducing only approximately 1% accuracy loss of the uncompressed full-precision models. Compared with individual activation compression schemes, the proposed joint optimization system reduces 55%/9% (-2.62/-0.27 bits) memory access of DR and 55%/63% (-2.60/-4.52 bits) memory access of MP quantization, respectively, on Resnet18/ MobileNetV2 with comparable or even higher accuracy.

引用

页码：4025 / 4037

页数：13

共 50 条

[21] CASCADED MIXED-PRECISION NETWORKS
Geng, Xue
Lin, Jie
Li, Shaohua
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 241 - 245
[22] Exploration of Automatic Mixed-Precision Search for Deep Neural Networks
Guo, Xuyang
Huang, Yuanjun
Cheng, Hsin-pai
Li, Bing
Wen, Wei
Ma, Siyuan
Li, Hai
Chen, Yiran
2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 276 - 278
[23] MixQuantBio: Towards extreme face and periocular recognition model compression with mixed-precision quantization
Kolf, Jan Niklas
Elliesen, Jurek
Damer, Naser
Boutros, Fadi
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
[24] Hardware-Centric AutoML for Mixed-Precision Quantization
Wang, Kuan
Liu, Zhijian
Lin, Yujun
Lin, Ji
Han, Song
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (8-9) : 2035 - 2048
[25] Mixed-Precision Collaborative Quantization for Fast Object Tracking
Xie, Yefan
Guo, Yanwei
Hou, Xuan
Zheng, Jiangbin
ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2023, 2024, 14374 : 229 - 238
[26] One-Shot Model for Mixed-Precision Quantization
Koryakovskiy, Ivan
Yakovleva, Alexandra
Buchnev, Valentin
Isaev, Temur
Odinokikh, Gleb
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7939 - 7949
[27] DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network
Pei, Songwen
Wang, Jiyao
Zhang, Bingxue
Qin, Wei
Xue, Hai
Ye, Xiaochun
Chen, Mingsong
MACHINE LEARNING, 2024, 113 (07) : 4099 - 4112
[28] Sound Mixed-Precision Optimization with Rewriting
Darulova, Eva
Horn, Einar
Sharma, Saksham
2018 9TH ACM/IEEE INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL SYSTEMS (ICCPS 2018), 2018, : 208 - 219
[29] Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance
Tang, Chen
Ouyang, Kai
Wang, Zhi
Zhu, Yifei
Ji, Wen
Wang, Yaowei
Zhu, Wenwu
COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 259 - 275
[30] CSMPQ: Class Separability Based Mixed-Precision Quantization
Wang, Mingkai
Jin, Taisong
Zhang, Miaohui
Yu, Zhengtao
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT I, 2023, 14086 : 544 - 555

← 1 2 3 4 5 →