Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

被引:2
|
作者
Balaskas, Konstantinos [1 ,2 ]
Karatzas, Andreas [3 ]
Sad, Christos [4 ]
Siozios, Kostas [4 ]
Anagnostopoulos, Iraklis [3 ]
Zervakis, Georgios [5 ]
Henkel, Jorg [6 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece
[2] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany
[3] Southern Illinois Univ, Sch Elect Comp & Biomed Engn, Carbondale, IL 62901 USA
[4] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece
[5] Univ Patras, Dept Comp Engn & Informat, Patras 26504, Greece
[6] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany
基金
欧盟地平线“2020”;
关键词
Quantization (signal); Artificial neural networks; Hardware; Optimization; Energy efficiency; Energy consumption; Tuning; Deep neural networks (DNNs); DNN accelerators; DNN compression; energy efficiency; pruning; quantization; reinforcement learning;
D O I
10.1109/TETC.2023.3346944
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for running sophisticated DNN-based services on resource constrained embedded devices. In this paper, we target energy-efficient inference on embedded DNN accelerators. To that end, we propose an automated framework to compress DNNs in a hardware-aware manner by jointly employing pruning and quantization. We explore, for the first time, per-layer fine- and coarse-grained pruning, in the same DNN architecture, in addition to low bit-width mixed-precision quantization for weights and activations. Reinforcement Learning (RL) is used to explore the associated design space and identify the pruning-quantization configuration so that the energy consumption is minimized whilst the prediction accuracy loss is retained at acceptable levels. Using our novel composite RL agent we are able to extract energy-efficient solutions without requiring retraining and/or fine tuning. Our extensive experimental evaluation over widely used DNNs and the CIFAR-10/100 and ImageNet datasets demonstrates that our framework achieves 39% average energy reduction for 1.7% average accuracy loss and outperforms significantly the state-of-the-art approaches.
引用
收藏
页码:1079 / 1092
页数:14
相关论文
共 50 条
  • [41] 3D CNN Acceleration on FPGA using Hardware-Aware Pruning
    Sun, Mengshu
    Zhao, Pu
    Gungor, Mehmet
    Pedram, Massoud
    Leeser, Miriam
    Lin, Xue
    PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
  • [42] MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
    Institute of Automation, Chinese Academy of Sciences, China
    不详
    不详
    不详
    不详
    Proc. Int. Symp. High Perform. Comput. Archit., 1600, (124-138): : 124 - 138
  • [43] Mixed-Precision Tomographic Reconstructor Computations on Hardware Accelerators
    Doucet, Nicolas
    Ltaief, Hatem
    Gratadour, Damien
    Keyes, David
    2019 IEEE/ACM 9TH WORKSHOP ON IRREGULAR APPLICATIONS - ARCHITECTURES AND ALGORITHMS (IA3), 2019, : 31 - 38
  • [44] HFP: Hardware-Aware Filter Pruning for Deep Convolutional Neural Networks Acceleration
    Yu, Fang
    Han, Chuanqi
    Wang, Pengcheng
    Huang, Ruoran
    Huang, Xi
    Cui, Li
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 255 - 262
  • [45] MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
    Zhu, Zeyu
    Li, Fanrong
    Li, Gang
    Liu, Zejian
    Mo, Zitao
    Hu, Qinghao
    Liang, Xiaoyao
    Cheng, Jian
    2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, 2024, : 124 - 138
  • [46] COMPRIZE: Assessing the Fusion of Quantization and Compression on DNN Hardware Accelerators
    Patel, Vrajesh
    Shah, Neel
    Krishna, Aravind
    Glint, Tom
    Ronak, Abdul
    Mekie, Joycee
    PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, VLSID 2024 AND 23RD INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, ES 2024, 2024, : 253 - 258
  • [47] CHAMP: Coherent Hardware-Aware Magnitude Pruning of Integrated Photonic Neural Networks
    Banerjee, Sanmitra
    Nikdast, Mahdi
    Pasricha, Sudeep
    Chakrabarty, Krishnendu
    2022 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXHIBITION (OFC), 2022,
  • [48] Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators
    Klhufek, Jan
    Safar, Miroslav
    Mrazek, Vojtech
    Vasicek, Zdenek
    Sekanina, Lukas
    2024 27TH INTERNATIONAL SYMPOSIUM ON DESIGN & DIAGNOSTICS OF ELECTRONIC CIRCUITS & SYSTEMS, DDECS, 2024, : 1 - 6
  • [49] Block-Wise Mixed-Precision Quantization: Enabling High Efficiency for Practical ReRAM-Based DNN Accelerators
    Wu, Xueying
    Hanson, Edward
    Wang, Nansu
    Zheng, Qilin
    Yang, Xiaoxuan
    Yang, Huanrui
    Li, Shiyu
    Cheng, Feng
    Pande, Partha Pratim
    Doppa, Janardhan Rao
    Chakrabarty, Krishnendu
    Li, Hai
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (12) : 4558 - 4571
  • [50] Mixed-Precision Network Quantization for Infrared Small Target Segmentation
    Li, Boyang
    Wang, Longguang
    Wang, Yingqian
    Wu, Tianhao
    Lin, Zaiping
    Li, Miao
    An, Wei
    Guo, Yulan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12