Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

被引:2
|
作者
Balaskas, Konstantinos [1 ,2 ]
Karatzas, Andreas [3 ]
Sad, Christos [4 ]
Siozios, Kostas [4 ]
Anagnostopoulos, Iraklis [3 ]
Zervakis, Georgios [5 ]
Henkel, Jorg [6 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece
[2] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany
[3] Southern Illinois Univ, Sch Elect Comp & Biomed Engn, Carbondale, IL 62901 USA
[4] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece
[5] Univ Patras, Dept Comp Engn & Informat, Patras 26504, Greece
[6] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany
基金
欧盟地平线“2020”;
关键词
Quantization (signal); Artificial neural networks; Hardware; Optimization; Energy efficiency; Energy consumption; Tuning; Deep neural networks (DNNs); DNN accelerators; DNN compression; energy efficiency; pruning; quantization; reinforcement learning;
D O I
10.1109/TETC.2023.3346944
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for running sophisticated DNN-based services on resource constrained embedded devices. In this paper, we target energy-efficient inference on embedded DNN accelerators. To that end, we propose an automated framework to compress DNNs in a hardware-aware manner by jointly employing pruning and quantization. We explore, for the first time, per-layer fine- and coarse-grained pruning, in the same DNN architecture, in addition to low bit-width mixed-precision quantization for weights and activations. Reinforcement Learning (RL) is used to explore the associated design space and identify the pruning-quantization configuration so that the energy consumption is minimized whilst the prediction accuracy loss is retained at acceptable levels. Using our novel composite RL agent we are able to extract energy-efficient solutions without requiring retraining and/or fine tuning. Our extensive experimental evaluation over widely used DNNs and the CIFAR-10/100 and ImageNet datasets demonstrates that our framework achieves 39% average energy reduction for 1.7% average accuracy loss and outperforms significantly the state-of-the-art approaches.
引用
收藏
页码:1079 / 1092
页数:14
相关论文
共 50 条
  • [21] A Novel Mixed-Precision Quantization Approach for CNNs
    Wu, Dan
    Wang, Yanzhi
    Fei, Yuqi
    Gao, Guowang
    IEEE ACCESS, 2025, 13 : 49309 - 49319
  • [22] EVOLUTIONARY QUANTIZATION OF NEURAL NETWORKS WITH MIXED-PRECISION
    Liu, Zhenhua
    Zhang, Xinfeng
    Wang, Shanshe
    Ma, Siwei
    Gao, Wen
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2785 - 2789
  • [23] Hessian-based mixed-precision quantization with transition aware training for neural networks
    Huang, Zhiyong
    Han, Xiao
    Yu, Zhi
    Zhao, Yunlan
    Hou, Mingyang
    Hu, Shengdong
    NEURAL NETWORKS, 2025, 182
  • [24] Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design
    Kim, Nahsung
    Shin, Dongyeob
    Choi, Wonseok
    Kim, Geonho
    Park, Jongsun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (07) : 2925 - 2938
  • [25] Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks
    Tai, Yu-Shan
    Chang, Cheng-Yang
    Teng, Chieh-Fang
    Chen, Yi-Ta
    Wu, An-Yeu
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (11) : 4025 - 4037
  • [26] MixQuantBio: Towards extreme face and periocular recognition model compression with mixed-precision quantization
    Kolf, Jan Niklas
    Elliesen, Jurek
    Damer, Naser
    Boutros, Fadi
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [27] Noise-Tolerant Hardware-Aware Pruning for Deep Neural Networks
    Lu, Shun
    Chen, Cheng
    Zhang, Kunlong
    Zheng, Yang
    Hu, Zheng
    Hong, Wenjing
    Li, Guiying
    Yao, Xin
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2023, PT II, 2023, 13969 : 127 - 138
  • [28] Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks
    Heidorn, Christian
    Sabih, Muhammad
    Meyerhoefer, Nicolai
    Schinabeck, Christian
    Teich, Juergen
    Hannig, Frank
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2024, 52 (1-2) : 40 - 58
  • [29] Dynamic Split Computing-Aware Mixed-Precision Quantization for Efficient Deep Edge Intelligence
    Nagamatsu, Naoki
    Hara-Azumi, Yuko
    2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 2538 - 2545
  • [30] Mixed-Precision Collaborative Quantization for Fast Object Tracking
    Xie, Yefan
    Guo, Yanwei
    Hou, Xuan
    Zheng, Jiangbin
    ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2023, 2024, 14374 : 229 - 238