Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks

被引:2
|
作者
Tai, Yu-Shan [1 ]
Chang, Cheng-Yang [1 ]
Teng, Chieh-Fang [2 ]
Chen, Yi-Ta [1 ]
Wu, An-Yeu [1 ]
机构
[1] Natl Taiwan Univ, GIEE, Taipei 10617, Taiwan
[2] Mediatek, CAI2, Hsinchu 300, Taiwan
关键词
Quantization (signal); Dimensionality reduction; Optimization; Principal component analysis; Memory management; Measurement; Discrete cosine transforms; Activation compression (AC); convolutional neural network (CNN); dimension reduction (DR); mixed-precision (MP) quantization;
D O I
10.1109/TCAD.2023.3248503
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, deep convolutional neural networks (CNNs) have achieved eye-catching results in various applications. However, intensive memory access of activations introduces considerable energy consumption, resulting in a great challenge for deploying CNNs on resource-constrained edge devices. Existing research utilizes dimension reduction (DR) and mixed-precision (MP) quantization separately to reduce computational complexity without paying attention to their interaction. Such naive concatenation of different compression strategies ends up with suboptimal performance. To develop a comprehensive compression framework, we propose an optimization system by jointly considering DR and MP quantization, which is enabled by independent groupwise learnable MP schemes. Group partitioning is guided by a well-designed automatic group partition mechanism that can distinguish compression priorities among channels, and it can deal with the tradeoff between model accuracy and compressibility. Moreover, to preserve model accuracy under low bit-width quantization, we propose a dynamic bit-width searching technique to enable continuous bit-width reduction. Our experimental results show that the proposed system reaches 69.03%/70.73% with average 2.16/2.61 bits per value on Resnet18/MobileNetV2, while introducing only approximately 1% accuracy loss of the uncompressed full-precision models. Compared with individual activation compression schemes, the proposed joint optimization system reduces 55%/9% (-2.62/-0.27 bits) memory access of DR and 55%/63% (-2.60/-4.52 bits) memory access of MP quantization, respectively, on Resnet18/ MobileNetV2 with comparable or even higher accuracy.
引用
收藏
页码:4025 / 4037
页数:13
相关论文
共 50 条
  • [31] Hardware-Centric AutoML for Mixed-Precision Quantization
    Kuan Wang
    Zhijian Liu
    Yujun Lin
    Ji Lin
    Song Han
    International Journal of Computer Vision, 2020, 128 : 2035 - 2048
  • [32] Optimizing Information Theory Based Bitwise Bottlenecks for Efficient Mixed-Precision Activation Quantization
    Zhou, Xichuan
    Liu, Kui
    Shi, Cong
    Liu, Haijun
    Liu, Ji
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3590 - 3598
  • [33] AMED: Automatic Mixed-Precision Quantization for Edge Devices
    Kimhi, Moshe
    Rozen, Tal
    Mendelson, Avi
    Baskin, Chaim
    MATHEMATICS, 2024, 12 (12)
  • [34] Hierarchical Mixed-Precision Post-Training Quantization for SAR Ship Detection Networks
    Wei, Hang
    Wang, Zulin
    Ni, Yuanhan
    REMOTE SENSING, 2024, 16 (21)
  • [35] Mixed-Precision Network Quantization for Infrared Small Target Segmentation
    Li, Boyang
    Wang, Longguang
    Wang, Yingqian
    Wu, Tianhao
    Lin, Zaiping
    Li, Miao
    An, Wei
    Guo, Yulan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
  • [36] Generalizable Mixed-Precision Quantization via Attribution Rank Preservation
    Wang, Ziwei
    Xiao, Han
    Lu, Jiwen
    Zhou, Jie
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5271 - 5280
  • [37] Learning Generalizable Mixed-Precision Quantization via Attribution Imitation
    Wang, Ziwei
    Xiao, Han
    Zhou, Jie
    Lu, Jiwen
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5101 - 5123
  • [38] Evaluating the Impact of Mixed-Precision on Fault Propagation for Deep Neural Networks on GPUs
    Dos Santos, Fernando Fernandes
    Rech, Paolo
    Kritikakou, Angeliki
    Sentieys, Olivier
    2022 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2022), 2022, : 327 - 327
  • [39] Enabling Mixed-Precision Quantized Neural Networks in Extreme-Edge Devices
    Bruschi, Nazareno
    Garofalo, Angelo
    Conti, Francesco
    Tagliavini, Giuseppe
    Rossi, Davide
    17TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2020 (CF 2020), 2020, : 217 - 220
  • [40] Patch-wise Mixed-Precision Quantization of Vision Transformer
    Xiao, Junrui
    Li, Zhikai
    Yang, Lianwei
    Gu, Qingyi
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,