Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks

被引:2
|
作者
Tai, Yu-Shan [1 ]
Chang, Cheng-Yang [1 ]
Teng, Chieh-Fang [2 ]
Chen, Yi-Ta [1 ]
Wu, An-Yeu [1 ]
机构
[1] Natl Taiwan Univ, GIEE, Taipei 10617, Taiwan
[2] Mediatek, CAI2, Hsinchu 300, Taiwan
关键词
Quantization (signal); Dimensionality reduction; Optimization; Principal component analysis; Memory management; Measurement; Discrete cosine transforms; Activation compression (AC); convolutional neural network (CNN); dimension reduction (DR); mixed-precision (MP) quantization;
D O I
10.1109/TCAD.2023.3248503
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, deep convolutional neural networks (CNNs) have achieved eye-catching results in various applications. However, intensive memory access of activations introduces considerable energy consumption, resulting in a great challenge for deploying CNNs on resource-constrained edge devices. Existing research utilizes dimension reduction (DR) and mixed-precision (MP) quantization separately to reduce computational complexity without paying attention to their interaction. Such naive concatenation of different compression strategies ends up with suboptimal performance. To develop a comprehensive compression framework, we propose an optimization system by jointly considering DR and MP quantization, which is enabled by independent groupwise learnable MP schemes. Group partitioning is guided by a well-designed automatic group partition mechanism that can distinguish compression priorities among channels, and it can deal with the tradeoff between model accuracy and compressibility. Moreover, to preserve model accuracy under low bit-width quantization, we propose a dynamic bit-width searching technique to enable continuous bit-width reduction. Our experimental results show that the proposed system reaches 69.03%/70.73% with average 2.16/2.61 bits per value on Resnet18/MobileNetV2, while introducing only approximately 1% accuracy loss of the uncompressed full-precision models. Compared with individual activation compression schemes, the proposed joint optimization system reduces 55%/9% (-2.62/-0.27 bits) memory access of DR and 55%/63% (-2.60/-4.52 bits) memory access of MP quantization, respectively, on Resnet18/ MobileNetV2 with comparable or even higher accuracy.
引用
收藏
页码:4025 / 4037
页数:13
相关论文
共 50 条
  • [1] EVOLUTIONARY QUANTIZATION OF NEURAL NETWORKS WITH MIXED-PRECISION
    Liu, Zhenhua
    Zhang, Xinfeng
    Wang, Shanshe
    Ma, Siwei
    Gao, Wen
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2785 - 2789
  • [2] Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization
    Chen, Weihan
    Wang, Peisong
    Cheng, Jian
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5330 - 5339
  • [3] Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks
    Vasquez, Karina
    Venkatesha, Yeshwanth
    Bhattacharjee, Abhiroop
    Moitra, Abhishek
    Panda, Priyadarshini
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1360 - 1365
  • [4] HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
    Dong, Zhen
    Yao, Zhewei
    Gholami, Amir
    Mahoney, Michael W.
    Keutzer, Kurt
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 293 - 302
  • [5] Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates
    Wang, Xuanda
    Fei, Wen
    Dai, Wenrui
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    2023 DATA COMPRESSION CONFERENCE, DCC, 2023, : 371 - 371
  • [6] Joint Pruning and Channel-Wise Mixed-Precision Quantization for Efficient Deep Neural Networks
    Motetti, Beatrice Alessandra
    Risso, Matteo
    Burrello, Alessio
    Macii, Enrico
    Poncino, Massimo
    Pagliari, Daniele Jahier
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (11) : 2619 - 2633
  • [7] Mixed-precision quantization-aware training for photonic neural networks
    Kirtas, Manos
    Passalis, Nikolaos
    Oikonomou, Athina
    Moralis-Pegios, Miltos
    Giamougiannis, George
    Tsakyridis, Apostolos
    Mourgias-Alexandris, George
    Pleros, Nikolaos
    Tefas, Anastasios
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (29): : 21361 - 21379
  • [8] Mixed-precision quantization-aware training for photonic neural networks
    Manos Kirtas
    Nikolaos Passalis
    Athina Oikonomou
    Miltos Moralis-Pegios
    George Giamougiannis
    Apostolos Tsakyridis
    George Mourgias-Alexandris
    Nikolaos Pleros
    Anastasios Tefas
    Neural Computing and Applications, 2023, 35 : 21361 - 21379
  • [9] Mixed-precision quantization for neural networks based on error limit (Invited)
    Li Y.
    Guo Z.
    Liu K.
    Sun X.
    Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2022, 51 (04):
  • [10] Hessian-based mixed-precision quantization with transition aware training for neural networks
    Huang, Zhiyong
    Han, Xiao
    Yu, Zhi
    Zhao, Yunlan
    Hou, Mingyang
    Hu, Shengdong
    NEURAL NETWORKS, 2025, 182