Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

被引：2

作者：

Balaskas, Konstantinos ^{[1
,2
]}

Karatzas, Andreas ^{[3
]}

Sad, Christos ^{[4
]}

Siozios, Kostas ^{[4
]}

Anagnostopoulos, Iraklis ^{[3
]}

Zervakis, Georgios ^{[5
]}

Henkel, Jorg ^{[6
]}

机构：

[1] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece

[2] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany

[3] Southern Illinois Univ, Sch Elect Comp & Biomed Engn, Carbondale, IL 62901 USA

[4] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece

[5] Univ Patras, Dept Comp Engn & Informat, Patras 26504, Greece

[6] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING | 2024年 / 12卷 / 04期

基金：

欧盟地平线“2020”;

关键词：

Quantization (signal); Artificial neural networks; Hardware; Optimization; Energy efficiency; Energy consumption; Tuning; Deep neural networks (DNNs); DNN accelerators; DNN compression; energy efficiency; pruning; quantization; reinforcement learning;

D O I：

10.1109/TETC.2023.3346944

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for running sophisticated DNN-based services on resource constrained embedded devices. In this paper, we target energy-efficient inference on embedded DNN accelerators. To that end, we propose an automated framework to compress DNNs in a hardware-aware manner by jointly employing pruning and quantization. We explore, for the first time, per-layer fine- and coarse-grained pruning, in the same DNN architecture, in addition to low bit-width mixed-precision quantization for weights and activations. Reinforcement Learning (RL) is used to explore the associated design space and identify the pruning-quantization configuration so that the energy consumption is minimized whilst the prediction accuracy loss is retained at acceptable levels. Using our novel composite RL agent we are able to extract energy-efficient solutions without requiring retraining and/or fine tuning. Our extensive experimental evaluation over widely used DNNs and the CIFAR-10/100 and ImageNet datasets demonstrates that our framework achieves 39% average energy reduction for 1.7% average accuracy loss and outperforms significantly the state-of-the-art approaches.

引用

页码：1079 / 1092

页数：14

共 50 条

[31] Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks
Christian Heidorn
Muhammad Sabih
Nicolai Meyerhöfer
Christian Schinabeck
Jürgen Teich
Frank Hannig
International Journal of Parallel Programming, 2024, 52 : 40 - 58
[32] One-Shot Model for Mixed-Precision Quantization
Koryakovskiy, Ivan
Yakovleva, Alexandra
Buchnev, Valentin
Isaev, Temur
Odinokikh, Gleb
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7939 - 7949
[33] Control-free and efficient integrated photonic neural networks via hardware-aware training and pruning
Xu, Tengji
Zhang, Weipeng
Zhang, Jiawei
Luo, Zeyu
Xiao, Qiarong
Wang, Benshan
Luo, Mingcheng
Xu, Xingyuan
Shastri, Bhavin J.
Prucnal, Paul R.
Huang, Chaoran
OPTICA, 2024, 11 (08): : 1039 - 1049
[34] g-BERT: Enabling Green BERT Deployment on FPGA via Hardware-Aware Hybrid Pruning
Bai, Yueyin
Zhou, Hao
Chen, Ruiqi
Zou, Kuangjie
Cao, Jialin
Zhang, Haoyang
Chen, Jianli
Yu, Jun
Wang, Kun
ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 1706 - 1711
[35] CSMPQ: Class Separability Based Mixed-Precision Quantization
Wang, Mingkai
Jin, Taisong
Zhang, Miaohui
Yu, Zhengtao
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT I, 2023, 14086 : 544 - 555
[36] AMED: Automatic Mixed-Precision Quantization for Edge Devices
Kimhi, Moshe
Rozen, Tal
Mendelson, Avi
Baskin, Chaim
MATHEMATICS, 2024, 12 (12)
[37] Hardware-aware Quantization/Mapping Strategies for Compute-in-Memory Accelerators
Huang, Shanshi
Jiang, Hongwu
Yu, Shimeng
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2023, 28 (03)
[38] Hardware for Quantized Mixed-Precision Deep Neural Networks
Rios, Andres
Nava, Patricia
PROCEEDINGS OF THE 2022 15TH IEEE DALLAS CIRCUITS AND SYSTEMS CONFERENCE (DCAS 2022), 2022,
[39] Explicit Model Size Control and Relaxation via Smooth Regularization for Mixed-Precision Quantization
Chikin, Vladimir
Solodskikh, Kirill
Zhelavskaya, Irina
COMPUTER VISION, ECCV 2022, PT XII, 2022, 13672 : 1 - 16
[40] Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance
Tang, Chen
Ouyang, Kai
Wang, Zhi
Zhu, Yifei
Ji, Wen
Wang, Yaowei
Zhu, Wenwu
COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 259 - 275

← 1 2 3 4 5 →