Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks

被引:2
|
作者
Tai, Yu-Shan [1 ]
Chang, Cheng-Yang [1 ]
Teng, Chieh-Fang [2 ]
Chen, Yi-Ta [1 ]
Wu, An-Yeu [1 ]
机构
[1] Natl Taiwan Univ, GIEE, Taipei 10617, Taiwan
[2] Mediatek, CAI2, Hsinchu 300, Taiwan
关键词
Quantization (signal); Dimensionality reduction; Optimization; Principal component analysis; Memory management; Measurement; Discrete cosine transforms; Activation compression (AC); convolutional neural network (CNN); dimension reduction (DR); mixed-precision (MP) quantization;
D O I
10.1109/TCAD.2023.3248503
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, deep convolutional neural networks (CNNs) have achieved eye-catching results in various applications. However, intensive memory access of activations introduces considerable energy consumption, resulting in a great challenge for deploying CNNs on resource-constrained edge devices. Existing research utilizes dimension reduction (DR) and mixed-precision (MP) quantization separately to reduce computational complexity without paying attention to their interaction. Such naive concatenation of different compression strategies ends up with suboptimal performance. To develop a comprehensive compression framework, we propose an optimization system by jointly considering DR and MP quantization, which is enabled by independent groupwise learnable MP schemes. Group partitioning is guided by a well-designed automatic group partition mechanism that can distinguish compression priorities among channels, and it can deal with the tradeoff between model accuracy and compressibility. Moreover, to preserve model accuracy under low bit-width quantization, we propose a dynamic bit-width searching technique to enable continuous bit-width reduction. Our experimental results show that the proposed system reaches 69.03%/70.73% with average 2.16/2.61 bits per value on Resnet18/MobileNetV2, while introducing only approximately 1% accuracy loss of the uncompressed full-precision models. Compared with individual activation compression schemes, the proposed joint optimization system reduces 55%/9% (-2.62/-0.27 bits) memory access of DR and 55%/63% (-2.60/-4.52 bits) memory access of MP quantization, respectively, on Resnet18/ MobileNetV2 with comparable or even higher accuracy.
引用
收藏
页码:4025 / 4037
页数:13
相关论文
共 50 条
  • [21] CASCADED MIXED-PRECISION NETWORKS
    Geng, Xue
    Lin, Jie
    Li, Shaohua
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 241 - 245
  • [22] Exploration of Automatic Mixed-Precision Search for Deep Neural Networks
    Guo, Xuyang
    Huang, Yuanjun
    Cheng, Hsin-pai
    Li, Bing
    Wen, Wei
    Ma, Siyuan
    Li, Hai
    Chen, Yiran
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 276 - 278
  • [23] MixQuantBio: Towards extreme face and periocular recognition model compression with mixed-precision quantization
    Kolf, Jan Niklas
    Elliesen, Jurek
    Damer, Naser
    Boutros, Fadi
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [24] Hardware-Centric AutoML for Mixed-Precision Quantization
    Wang, Kuan
    Liu, Zhijian
    Lin, Yujun
    Lin, Ji
    Han, Song
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (8-9) : 2035 - 2048
  • [25] Mixed-Precision Collaborative Quantization for Fast Object Tracking
    Xie, Yefan
    Guo, Yanwei
    Hou, Xuan
    Zheng, Jiangbin
    ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2023, 2024, 14374 : 229 - 238
  • [26] One-Shot Model for Mixed-Precision Quantization
    Koryakovskiy, Ivan
    Yakovleva, Alexandra
    Buchnev, Valentin
    Isaev, Temur
    Odinokikh, Gleb
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7939 - 7949
  • [27] DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network
    Pei, Songwen
    Wang, Jiyao
    Zhang, Bingxue
    Qin, Wei
    Xue, Hai
    Ye, Xiaochun
    Chen, Mingsong
    MACHINE LEARNING, 2024, 113 (07) : 4099 - 4112
  • [28] Sound Mixed-Precision Optimization with Rewriting
    Darulova, Eva
    Horn, Einar
    Sharma, Saksham
    2018 9TH ACM/IEEE INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL SYSTEMS (ICCPS 2018), 2018, : 208 - 219
  • [29] Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance
    Tang, Chen
    Ouyang, Kai
    Wang, Zhi
    Zhu, Yifei
    Ji, Wen
    Wang, Yaowei
    Zhu, Wenwu
    COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 259 - 275
  • [30] CSMPQ: Class Separability Based Mixed-Precision Quantization
    Wang, Mingkai
    Jin, Taisong
    Zhang, Miaohui
    Yu, Zhengtao
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT I, 2023, 14086 : 544 - 555