Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks

被引：2

作者：

Tai, Yu-Shan ^{[1
]}

Chang, Cheng-Yang ^{[1
]}

Teng, Chieh-Fang ^{[2
]}

Chen, Yi-Ta ^{[1
]}

Wu, An-Yeu ^{[1
]}

机构：

[1] Natl Taiwan Univ, GIEE, Taipei 10617, Taiwan

[2] Mediatek, CAI2, Hsinchu 300, Taiwan

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2023年 / 42卷 / 11期

关键词：

Quantization (signal); Dimensionality reduction; Optimization; Principal component analysis; Memory management; Measurement; Discrete cosine transforms; Activation compression (AC); convolutional neural network (CNN); dimension reduction (DR); mixed-precision (MP) quantization;

D O I：

10.1109/TCAD.2023.3248503

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, deep convolutional neural networks (CNNs) have achieved eye-catching results in various applications. However, intensive memory access of activations introduces considerable energy consumption, resulting in a great challenge for deploying CNNs on resource-constrained edge devices. Existing research utilizes dimension reduction (DR) and mixed-precision (MP) quantization separately to reduce computational complexity without paying attention to their interaction. Such naive concatenation of different compression strategies ends up with suboptimal performance. To develop a comprehensive compression framework, we propose an optimization system by jointly considering DR and MP quantization, which is enabled by independent groupwise learnable MP schemes. Group partitioning is guided by a well-designed automatic group partition mechanism that can distinguish compression priorities among channels, and it can deal with the tradeoff between model accuracy and compressibility. Moreover, to preserve model accuracy under low bit-width quantization, we propose a dynamic bit-width searching technique to enable continuous bit-width reduction. Our experimental results show that the proposed system reaches 69.03%/70.73% with average 2.16/2.61 bits per value on Resnet18/MobileNetV2, while introducing only approximately 1% accuracy loss of the uncompressed full-precision models. Compared with individual activation compression schemes, the proposed joint optimization system reduces 55%/9% (-2.62/-0.27 bits) memory access of DR and 55%/63% (-2.60/-4.52 bits) memory access of MP quantization, respectively, on Resnet18/ MobileNetV2 with comparable or even higher accuracy.

引用

页码：4025 / 4037

页数：13

共 50 条

[41] Mixed-precision architecture based on computational memory for training deep neural networks
Nandakumar, S. R.
Le Gallo, Manuel
Boybat, Irem
Rajendran, Bipin
Sebastian, Abu
Eleftheriou, Evangelos
2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
[42] GradFreeBits: Gradient-Free Bit Allocation for Mixed-Precision Neural Networks
Bodner, Benjamin Jacob
Ben-Shalom, Gil
Treister, Eran
SENSORS, 2022, 22 (24)
[43] AutoMPQ: Automatic Mixed-Precision Neural Network Search via Few-Shot Quantization Adapter
Xu, Ke
Shao, Xiangyang
Tian, Ye
Yang, Shangshang
Zhang, Xingyi
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, : 1 - 13
[44] CMQ: Crossbar-Aware Neural Network Mixed-Precision Quantization via Differentiable Architecture Search
Peng, Jie
Liu, Haijun
Zhao, Zhongjin
Li, Zhiwei
Liu, Sen
Li, Qingjiang
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (11) : 4124 - 4133
[45] Campo: Cost-Aware Performance Optimization for Mixed-Precision Neural Network Training
He, Xin
Sun, Jianhua
Chen, Hao
Li, Dong
PROCEEDINGS OF THE 2022 USENIX ANNUAL TECHNICAL CONFERENCE, 2022, : 505 - 518
[46] A Mixed-Precision Quantization Method without Accuracy Degradation Using Semilayers
Matsumoto, Kengo
Matsuda, Tomoya
Inoue, Atsuki
Kawaguchi, Hiroshi
Sakai, Yasufumi
ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
[47] Mixed-precision Quantization with Dynamical Hessian Matrix for Object Detection Network
Yang, Zerui
Fei, Wen
Dai, Wenrui
Li, Chenglin
Zou, Junni
Xiong, Hongkai
2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
[48] Mixed-Precision Quantization of U-Net for Medical Image Segmentation
Guo, Liming
Fei, Wen
Dai, Wenrui
Li, Chenglin
Zou, Junni
Xiong, Hongkai
2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2871 - 2875
[49] A NOVEL SENSITIVITY METRIC FOR MIXED-PRECISION QUANTIZATION WITH SYNTHETIC DATA GENERATION
Lee, Donghyun
Cho, Minkyoung
Lee, Seungwon
Song, Joonho
Choi, Changkyu
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1294 - 1298
[50] Entropy-Driven Mixed-Precision Quantization for Deep Network Design
Sun, Zhenhong
Ge, Ce
Wang, Junyan
Lin, Ming
Chen, Hesen
Li, Hao
Sun, Xiuyu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →