Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

被引：2

作者：

Balaskas, Konstantinos ^{[1
,2
]}

Karatzas, Andreas ^{[3
]}

Sad, Christos ^{[4
]}

Siozios, Kostas ^{[4
]}

Anagnostopoulos, Iraklis ^{[3
]}

Zervakis, Georgios ^{[5
]}

Henkel, Jorg ^{[6
]}

机构：

[1] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece

[2] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany

[3] Southern Illinois Univ, Sch Elect Comp & Biomed Engn, Carbondale, IL 62901 USA

[4] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece

[5] Univ Patras, Dept Comp Engn & Informat, Patras 26504, Greece

[6] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING | 2024年 / 12卷 / 04期

基金：

欧盟地平线“2020”;

关键词：

Quantization (signal); Artificial neural networks; Hardware; Optimization; Energy efficiency; Energy consumption; Tuning; Deep neural networks (DNNs); DNN accelerators; DNN compression; energy efficiency; pruning; quantization; reinforcement learning;

D O I：

10.1109/TETC.2023.3346944

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for running sophisticated DNN-based services on resource constrained embedded devices. In this paper, we target energy-efficient inference on embedded DNN accelerators. To that end, we propose an automated framework to compress DNNs in a hardware-aware manner by jointly employing pruning and quantization. We explore, for the first time, per-layer fine- and coarse-grained pruning, in the same DNN architecture, in addition to low bit-width mixed-precision quantization for weights and activations. Reinforcement Learning (RL) is used to explore the associated design space and identify the pruning-quantization configuration so that the energy consumption is minimized whilst the prediction accuracy loss is retained at acceptable levels. Using our novel composite RL agent we are able to extract energy-efficient solutions without requiring retraining and/or fine tuning. Our extensive experimental evaluation over widely used DNNs and the CIFAR-10/100 and ImageNet datasets demonstrates that our framework achieves 39% average energy reduction for 1.7% average accuracy loss and outperforms significantly the state-of-the-art approaches.

引用

页码：1079 / 1092

页数：14

共 50 条

[41] 3D CNN Acceleration on FPGA using Hardware-Aware Pruning
Sun, Mengshu
Zhao, Pu
Gungor, Mehmet
Pedram, Massoud
Leeser, Miriam
Lin, Xue
PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
[42] MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
Institute of Automation, Chinese Academy of Sciences, China
不详
不详
不详
不详
Proc. Int. Symp. High Perform. Comput. Archit., 1600, (124-138): : 124 - 138
[43] Mixed-Precision Tomographic Reconstructor Computations on Hardware Accelerators
Doucet, Nicolas
Ltaief, Hatem
Gratadour, Damien
Keyes, David
2019 IEEE/ACM 9TH WORKSHOP ON IRREGULAR APPLICATIONS - ARCHITECTURES AND ALGORITHMS (IA3), 2019, : 31 - 38
[44] HFP: Hardware-Aware Filter Pruning for Deep Convolutional Neural Networks Acceleration
Yu, Fang
Han, Chuanqi
Wang, Pengcheng
Huang, Ruoran
Huang, Xi
Cui, Li
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 255 - 262
[45] MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
Zhu, Zeyu
Li, Fanrong
Li, Gang
Liu, Zejian
Mo, Zitao
Hu, Qinghao
Liang, Xiaoyao
Cheng, Jian
2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, 2024, : 124 - 138
[46] COMPRIZE: Assessing the Fusion of Quantization and Compression on DNN Hardware Accelerators
Patel, Vrajesh
Shah, Neel
Krishna, Aravind
Glint, Tom
Ronak, Abdul
Mekie, Joycee
PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, VLSID 2024 AND 23RD INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, ES 2024, 2024, : 253 - 258
[47] CHAMP: Coherent Hardware-Aware Magnitude Pruning of Integrated Photonic Neural Networks
Banerjee, Sanmitra
Nikdast, Mahdi
Pasricha, Sudeep
Chakrabarty, Krishnendu
2022 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXHIBITION (OFC), 2022,
[48] Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators
Klhufek, Jan
Safar, Miroslav
Mrazek, Vojtech
Vasicek, Zdenek
Sekanina, Lukas
2024 27TH INTERNATIONAL SYMPOSIUM ON DESIGN & DIAGNOSTICS OF ELECTRONIC CIRCUITS & SYSTEMS, DDECS, 2024, : 1 - 6
[49] Block-Wise Mixed-Precision Quantization: Enabling High Efficiency for Practical ReRAM-Based DNN Accelerators
Wu, Xueying
Hanson, Edward
Wang, Nansu
Zheng, Qilin
Yang, Xiaoxuan
Yang, Huanrui
Li, Shiyu
Cheng, Feng
Pande, Partha Pratim
Doppa, Janardhan Rao
Chakrabarty, Krishnendu
Li, Hai
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (12) : 4558 - 4571
[50] Mixed-Precision Network Quantization for Infrared Small Target Segmentation
Li, Boyang
Wang, Longguang
Wang, Yingqian
Wu, Tianhao
Lin, Zaiping
Li, Miao
An, Wei
Guo, Yulan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12

← 1 2 3 4 5 →