Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

被引:2
|
作者
Balaskas, Konstantinos [1 ,2 ]
Karatzas, Andreas [3 ]
Sad, Christos [4 ]
Siozios, Kostas [4 ]
Anagnostopoulos, Iraklis [3 ]
Zervakis, Georgios [5 ]
Henkel, Jorg [6 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece
[2] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany
[3] Southern Illinois Univ, Sch Elect Comp & Biomed Engn, Carbondale, IL 62901 USA
[4] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece
[5] Univ Patras, Dept Comp Engn & Informat, Patras 26504, Greece
[6] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany
基金
欧盟地平线“2020”;
关键词
Quantization (signal); Artificial neural networks; Hardware; Optimization; Energy efficiency; Energy consumption; Tuning; Deep neural networks (DNNs); DNN accelerators; DNN compression; energy efficiency; pruning; quantization; reinforcement learning;
D O I
10.1109/TETC.2023.3346944
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for running sophisticated DNN-based services on resource constrained embedded devices. In this paper, we target energy-efficient inference on embedded DNN accelerators. To that end, we propose an automated framework to compress DNNs in a hardware-aware manner by jointly employing pruning and quantization. We explore, for the first time, per-layer fine- and coarse-grained pruning, in the same DNN architecture, in addition to low bit-width mixed-precision quantization for weights and activations. Reinforcement Learning (RL) is used to explore the associated design space and identify the pruning-quantization configuration so that the energy consumption is minimized whilst the prediction accuracy loss is retained at acceptable levels. Using our novel composite RL agent we are able to extract energy-efficient solutions without requiring retraining and/or fine tuning. Our extensive experimental evaluation over widely used DNNs and the CIFAR-10/100 and ImageNet datasets demonstrates that our framework achieves 39% average energy reduction for 1.7% average accuracy loss and outperforms significantly the state-of-the-art approaches.
引用
收藏
页码:1079 / 1092
页数:14
相关论文
共 50 条
  • [31] Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks
    Christian Heidorn
    Muhammad Sabih
    Nicolai Meyerhöfer
    Christian Schinabeck
    Jürgen Teich
    Frank Hannig
    International Journal of Parallel Programming, 2024, 52 : 40 - 58
  • [32] One-Shot Model for Mixed-Precision Quantization
    Koryakovskiy, Ivan
    Yakovleva, Alexandra
    Buchnev, Valentin
    Isaev, Temur
    Odinokikh, Gleb
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7939 - 7949
  • [33] Control-free and efficient integrated photonic neural networks via hardware-aware training and pruning
    Xu, Tengji
    Zhang, Weipeng
    Zhang, Jiawei
    Luo, Zeyu
    Xiao, Qiarong
    Wang, Benshan
    Luo, Mingcheng
    Xu, Xingyuan
    Shastri, Bhavin J.
    Prucnal, Paul R.
    Huang, Chaoran
    OPTICA, 2024, 11 (08): : 1039 - 1049
  • [34] g-BERT: Enabling Green BERT Deployment on FPGA via Hardware-Aware Hybrid Pruning
    Bai, Yueyin
    Zhou, Hao
    Chen, Ruiqi
    Zou, Kuangjie
    Cao, Jialin
    Zhang, Haoyang
    Chen, Jianli
    Yu, Jun
    Wang, Kun
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 1706 - 1711
  • [35] CSMPQ: Class Separability Based Mixed-Precision Quantization
    Wang, Mingkai
    Jin, Taisong
    Zhang, Miaohui
    Yu, Zhengtao
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT I, 2023, 14086 : 544 - 555
  • [36] AMED: Automatic Mixed-Precision Quantization for Edge Devices
    Kimhi, Moshe
    Rozen, Tal
    Mendelson, Avi
    Baskin, Chaim
    MATHEMATICS, 2024, 12 (12)
  • [37] Hardware-aware Quantization/Mapping Strategies for Compute-in-Memory Accelerators
    Huang, Shanshi
    Jiang, Hongwu
    Yu, Shimeng
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2023, 28 (03)
  • [38] Hardware for Quantized Mixed-Precision Deep Neural Networks
    Rios, Andres
    Nava, Patricia
    PROCEEDINGS OF THE 2022 15TH IEEE DALLAS CIRCUITS AND SYSTEMS CONFERENCE (DCAS 2022), 2022,
  • [39] Explicit Model Size Control and Relaxation via Smooth Regularization for Mixed-Precision Quantization
    Chikin, Vladimir
    Solodskikh, Kirill
    Zhelavskaya, Irina
    COMPUTER VISION, ECCV 2022, PT XII, 2022, 13672 : 1 - 16
  • [40] Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance
    Tang, Chen
    Ouyang, Kai
    Wang, Zhi
    Zhu, Yifei
    Ji, Wen
    Wang, Yaowei
    Zhu, Wenwu
    COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 259 - 275