Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

被引:2
|
作者
Balaskas, Konstantinos [1 ,2 ]
Karatzas, Andreas [3 ]
Sad, Christos [4 ]
Siozios, Kostas [4 ]
Anagnostopoulos, Iraklis [3 ]
Zervakis, Georgios [5 ]
Henkel, Jorg [6 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece
[2] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany
[3] Southern Illinois Univ, Sch Elect Comp & Biomed Engn, Carbondale, IL 62901 USA
[4] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece
[5] Univ Patras, Dept Comp Engn & Informat, Patras 26504, Greece
[6] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany
基金
欧盟地平线“2020”;
关键词
Quantization (signal); Artificial neural networks; Hardware; Optimization; Energy efficiency; Energy consumption; Tuning; Deep neural networks (DNNs); DNN accelerators; DNN compression; energy efficiency; pruning; quantization; reinforcement learning;
D O I
10.1109/TETC.2023.3346944
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for running sophisticated DNN-based services on resource constrained embedded devices. In this paper, we target energy-efficient inference on embedded DNN accelerators. To that end, we propose an automated framework to compress DNNs in a hardware-aware manner by jointly employing pruning and quantization. We explore, for the first time, per-layer fine- and coarse-grained pruning, in the same DNN architecture, in addition to low bit-width mixed-precision quantization for weights and activations. Reinforcement Learning (RL) is used to explore the associated design space and identify the pruning-quantization configuration so that the energy consumption is minimized whilst the prediction accuracy loss is retained at acceptable levels. Using our novel composite RL agent we are able to extract energy-efficient solutions without requiring retraining and/or fine tuning. Our extensive experimental evaluation over widely used DNNs and the CIFAR-10/100 and ImageNet datasets demonstrates that our framework achieves 39% average energy reduction for 1.7% average accuracy loss and outperforms significantly the state-of-the-art approaches.
引用
收藏
页码:1079 / 1092
页数:14
相关论文
共 50 条
  • [1] HAQ: Hardware-Aware Automated Quantization with Mixed Precision
    Wang, Kuan
    Liu, Zhijian
    Lin, Yujun
    Lin, Ji
    Han, Song
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8604 - 8612
  • [2] Hardware-Centric AutoML for Mixed-Precision Quantization
    Wang, Kuan
    Liu, Zhijian
    Lin, Yujun
    Lin, Ji
    Han, Song
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (8-9) : 2035 - 2048
  • [3] Hardware-Centric AutoML for Mixed-Precision Quantization
    Kuan Wang
    Zhijian Liu
    Yujun Lin
    Ji Lin
    Song Han
    International Journal of Computer Vision, 2020, 128 : 2035 - 2048
  • [4] Hardware-Aware Evolutionary Filter Pruning
    Heidorn, Christian
    Meyerhoefer, Nicolai
    Schinabeck, Christian
    Hannig, Frank
    Teich, Juergen
    EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2022, 2022, 13511 : 283 - 299
  • [5] HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
    Dong, Zhen
    Yao, Zhewei
    Gholami, Amir
    Mahoney, Michael W.
    Keutzer, Kurt
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 293 - 302
  • [6] Data Quality-Aware Mixed-Precision Quantization via Hybrid Reinforcement Learning
    Wang, Yingchun
    Guo, Song
    Guo, Jingcai
    Zhang, Yuanhong
    Zhang, Weizhan
    Zheng, Qinghua
    Zhang, Jie
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 14
  • [7] Accuracy vs. Efficiency: Achieving both Through Hardware-Aware Quantization and Reconfigurable Architecture with Mixed Precision
    Chang, Libo
    Zhang, Shengbing
    Du, Huimin
    Wang, Shiyu
    Qiu, Meikang
    Wang, Jihe
    19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 151 - 158
  • [8] Mixed-precision quantization-aware training for photonic neural networks
    Kirtas, Manos
    Passalis, Nikolaos
    Oikonomou, Athina
    Moralis-Pegios, Miltos
    Giamougiannis, George
    Tsakyridis, Apostolos
    Mourgias-Alexandris, George
    Pleros, Nikolaos
    Tefas, Anastasios
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (29): : 21361 - 21379
  • [9] Mixed-precision quantization-aware training for photonic neural networks
    Manos Kirtas
    Nikolaos Passalis
    Athina Oikonomou
    Miltos Moralis-Pegios
    George Giamougiannis
    Apostolos Tsakyridis
    George Mourgias-Alexandris
    Nikolaos Pleros
    Anastasios Tefas
    Neural Computing and Applications, 2023, 35 : 21361 - 21379
  • [10] Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates
    Wang, Xuanda
    Fei, Wen
    Dai, Wenrui
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    2023 DATA COMPRESSION CONFERENCE, DCC, 2023, : 371 - 371