Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks

被引:2
|
作者
Tai, Yu-Shan [1 ]
Chang, Cheng-Yang [1 ]
Teng, Chieh-Fang [2 ]
Chen, Yi-Ta [1 ]
Wu, An-Yeu [1 ]
机构
[1] Natl Taiwan Univ, GIEE, Taipei 10617, Taiwan
[2] Mediatek, CAI2, Hsinchu 300, Taiwan
关键词
Quantization (signal); Dimensionality reduction; Optimization; Principal component analysis; Memory management; Measurement; Discrete cosine transforms; Activation compression (AC); convolutional neural network (CNN); dimension reduction (DR); mixed-precision (MP) quantization;
D O I
10.1109/TCAD.2023.3248503
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, deep convolutional neural networks (CNNs) have achieved eye-catching results in various applications. However, intensive memory access of activations introduces considerable energy consumption, resulting in a great challenge for deploying CNNs on resource-constrained edge devices. Existing research utilizes dimension reduction (DR) and mixed-precision (MP) quantization separately to reduce computational complexity without paying attention to their interaction. Such naive concatenation of different compression strategies ends up with suboptimal performance. To develop a comprehensive compression framework, we propose an optimization system by jointly considering DR and MP quantization, which is enabled by independent groupwise learnable MP schemes. Group partitioning is guided by a well-designed automatic group partition mechanism that can distinguish compression priorities among channels, and it can deal with the tradeoff between model accuracy and compressibility. Moreover, to preserve model accuracy under low bit-width quantization, we propose a dynamic bit-width searching technique to enable continuous bit-width reduction. Our experimental results show that the proposed system reaches 69.03%/70.73% with average 2.16/2.61 bits per value on Resnet18/MobileNetV2, while introducing only approximately 1% accuracy loss of the uncompressed full-precision models. Compared with individual activation compression schemes, the proposed joint optimization system reduces 55%/9% (-2.62/-0.27 bits) memory access of DR and 55%/63% (-2.60/-4.52 bits) memory access of MP quantization, respectively, on Resnet18/ MobileNetV2 with comparable or even higher accuracy.
引用
收藏
页码:4025 / 4037
页数:13
相关论文
共 50 条
  • [41] Mixed-precision architecture based on computational memory for training deep neural networks
    Nandakumar, S. R.
    Le Gallo, Manuel
    Boybat, Irem
    Rajendran, Bipin
    Sebastian, Abu
    Eleftheriou, Evangelos
    2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
  • [42] GradFreeBits: Gradient-Free Bit Allocation for Mixed-Precision Neural Networks
    Bodner, Benjamin Jacob
    Ben-Shalom, Gil
    Treister, Eran
    SENSORS, 2022, 22 (24)
  • [43] AutoMPQ: Automatic Mixed-Precision Neural Network Search via Few-Shot Quantization Adapter
    Xu, Ke
    Shao, Xiangyang
    Tian, Ye
    Yang, Shangshang
    Zhang, Xingyi
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, : 1 - 13
  • [44] CMQ: Crossbar-Aware Neural Network Mixed-Precision Quantization via Differentiable Architecture Search
    Peng, Jie
    Liu, Haijun
    Zhao, Zhongjin
    Li, Zhiwei
    Liu, Sen
    Li, Qingjiang
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (11) : 4124 - 4133
  • [45] Campo: Cost-Aware Performance Optimization for Mixed-Precision Neural Network Training
    He, Xin
    Sun, Jianhua
    Chen, Hao
    Li, Dong
    PROCEEDINGS OF THE 2022 USENIX ANNUAL TECHNICAL CONFERENCE, 2022, : 505 - 518
  • [46] A Mixed-Precision Quantization Method without Accuracy Degradation Using Semilayers
    Matsumoto, Kengo
    Matsuda, Tomoya
    Inoue, Atsuki
    Kawaguchi, Hiroshi
    Sakai, Yasufumi
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
  • [47] Mixed-precision Quantization with Dynamical Hessian Matrix for Object Detection Network
    Yang, Zerui
    Fei, Wen
    Dai, Wenrui
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [48] Mixed-Precision Quantization of U-Net for Medical Image Segmentation
    Guo, Liming
    Fei, Wen
    Dai, Wenrui
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2871 - 2875
  • [49] A NOVEL SENSITIVITY METRIC FOR MIXED-PRECISION QUANTIZATION WITH SYNTHETIC DATA GENERATION
    Lee, Donghyun
    Cho, Minkyoung
    Lee, Seungwon
    Song, Joonho
    Choi, Changkyu
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1294 - 1298
  • [50] Entropy-Driven Mixed-Precision Quantization for Deep Network Design
    Sun, Zhenhong
    Ge, Ce
    Wang, Junyan
    Lin, Ming
    Chen, Hesen
    Li, Hao
    Sun, Xiuyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,