Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

被引：2

作者：

Balaskas, Konstantinos ^{[1
,2
]}

Karatzas, Andreas ^{[3
]}

Sad, Christos ^{[4
]}

Siozios, Kostas ^{[4
]}

Anagnostopoulos, Iraklis ^{[3
]}

Zervakis, Georgios ^{[5
]}

Henkel, Jorg ^{[6
]}

机构：

[1] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece

[2] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany

[3] Southern Illinois Univ, Sch Elect Comp & Biomed Engn, Carbondale, IL 62901 USA

[4] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece

[5] Univ Patras, Dept Comp Engn & Informat, Patras 26504, Greece

[6] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING | 2024年 / 12卷 / 04期

基金：

欧盟地平线“2020”;

关键词：

Quantization (signal); Artificial neural networks; Hardware; Optimization; Energy efficiency; Energy consumption; Tuning; Deep neural networks (DNNs); DNN accelerators; DNN compression; energy efficiency; pruning; quantization; reinforcement learning;

D O I：

10.1109/TETC.2023.3346944

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for running sophisticated DNN-based services on resource constrained embedded devices. In this paper, we target energy-efficient inference on embedded DNN accelerators. To that end, we propose an automated framework to compress DNNs in a hardware-aware manner by jointly employing pruning and quantization. We explore, for the first time, per-layer fine- and coarse-grained pruning, in the same DNN architecture, in addition to low bit-width mixed-precision quantization for weights and activations. Reinforcement Learning (RL) is used to explore the associated design space and identify the pruning-quantization configuration so that the energy consumption is minimized whilst the prediction accuracy loss is retained at acceptable levels. Using our novel composite RL agent we are able to extract energy-efficient solutions without requiring retraining and/or fine tuning. Our extensive experimental evaluation over widely used DNNs and the CIFAR-10/100 and ImageNet datasets demonstrates that our framework achieves 39% average energy reduction for 1.7% average accuracy loss and outperforms significantly the state-of-the-art approaches.

引用

页码：1079 / 1092

页数：14

共 50 条

[21] A Novel Mixed-Precision Quantization Approach for CNNs
Wu, Dan
Wang, Yanzhi
Fei, Yuqi
Gao, Guowang
IEEE ACCESS, 2025, 13 : 49309 - 49319
[22] EVOLUTIONARY QUANTIZATION OF NEURAL NETWORKS WITH MIXED-PRECISION
Liu, Zhenhua
Zhang, Xinfeng
Wang, Shanshe
Ma, Siwei
Gao, Wen
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2785 - 2789
[23] Hessian-based mixed-precision quantization with transition aware training for neural networks
Huang, Zhiyong
Han, Xiao
Yu, Zhi
Zhao, Yunlan
Hou, Mingyang
Hu, Shengdong
NEURAL NETWORKS, 2025, 182
[24] Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design
Kim, Nahsung
Shin, Dongyeob
Choi, Wonseok
Kim, Geonho
Park, Jongsun
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (07) : 2925 - 2938
[25] Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks
Tai, Yu-Shan
Chang, Cheng-Yang
Teng, Chieh-Fang
Chen, Yi-Ta
Wu, An-Yeu
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (11) : 4025 - 4037
[26] MixQuantBio: Towards extreme face and periocular recognition model compression with mixed-precision quantization
Kolf, Jan Niklas
Elliesen, Jurek
Damer, Naser
Boutros, Fadi
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
[27] Noise-Tolerant Hardware-Aware Pruning for Deep Neural Networks
Lu, Shun
Chen, Cheng
Zhang, Kunlong
Zheng, Yang
Hu, Zheng
Hong, Wenjing
Li, Guiying
Yao, Xin
ADVANCES IN SWARM INTELLIGENCE, ICSI 2023, PT II, 2023, 13969 : 127 - 138
[28] Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks
Heidorn, Christian
Sabih, Muhammad
Meyerhoefer, Nicolai
Schinabeck, Christian
Teich, Juergen
Hannig, Frank
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2024, 52 (1-2) : 40 - 58
[29] Dynamic Split Computing-Aware Mixed-Precision Quantization for Efficient Deep Edge Intelligence
Nagamatsu, Naoki
Hara-Azumi, Yuko
2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 2538 - 2545
[30] Mixed-Precision Collaborative Quantization for Fast Object Tracking
Xie, Yefan
Guo, Yanwei
Hou, Xuan
Zheng, Jiangbin
ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2023, 2024, 14374 : 229 - 238

← 1 2 3 4 5 →