Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

被引：2

作者：

Balaskas, Konstantinos ^{[1
,2
]}

Karatzas, Andreas ^{[3
]}

Sad, Christos ^{[4
]}

Siozios, Kostas ^{[4
]}

Anagnostopoulos, Iraklis ^{[3
]}

Zervakis, Georgios ^{[5
]}

Henkel, Jorg ^{[6
]}

机构：

[1] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece

[2] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany

[3] Southern Illinois Univ, Sch Elect Comp & Biomed Engn, Carbondale, IL 62901 USA

[4] Aristotle Univ Thessaloniki, Dept Phys, Thessaloniki 54124, Greece

[5] Univ Patras, Dept Comp Engn & Informat, Patras 26504, Greece

[6] Karlsruhe Inst Technol, Chair Embedded Syst, D-76131 Karlsruhe, Germany

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING | 2024年 / 12卷 / 04期

基金：

欧盟地平线“2020”;

关键词：

Quantization (signal); Artificial neural networks; Hardware; Optimization; Energy efficiency; Energy consumption; Tuning; Deep neural networks (DNNs); DNN accelerators; DNN compression; energy efficiency; pruning; quantization; reinforcement learning;

D O I：

10.1109/TETC.2023.3346944

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for running sophisticated DNN-based services on resource constrained embedded devices. In this paper, we target energy-efficient inference on embedded DNN accelerators. To that end, we propose an automated framework to compress DNNs in a hardware-aware manner by jointly employing pruning and quantization. We explore, for the first time, per-layer fine- and coarse-grained pruning, in the same DNN architecture, in addition to low bit-width mixed-precision quantization for weights and activations. Reinforcement Learning (RL) is used to explore the associated design space and identify the pruning-quantization configuration so that the energy consumption is minimized whilst the prediction accuracy loss is retained at acceptable levels. Using our novel composite RL agent we are able to extract energy-efficient solutions without requiring retraining and/or fine tuning. Our extensive experimental evaluation over widely used DNNs and the CIFAR-10/100 and ImageNet datasets demonstrates that our framework achieves 39% average energy reduction for 1.7% average accuracy loss and outperforms significantly the state-of-the-art approaches.

引用

页码：1079 / 1092

页数：14

共 50 条

[1] HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Wang, Kuan
Liu, Zhijian
Lin, Yujun
Lin, Ji
Han, Song
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8604 - 8612
[2] Hardware-Centric AutoML for Mixed-Precision Quantization
Wang, Kuan
Liu, Zhijian
Lin, Yujun
Lin, Ji
Han, Song
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (8-9) : 2035 - 2048
[3] Hardware-Centric AutoML for Mixed-Precision Quantization
Kuan Wang
Zhijian Liu
Yujun Lin
Ji Lin
Song Han
International Journal of Computer Vision, 2020, 128 : 2035 - 2048
[4] Hardware-Aware Evolutionary Filter Pruning
Heidorn, Christian
Meyerhoefer, Nicolai
Schinabeck, Christian
Hannig, Frank
Teich, Juergen
EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2022, 2022, 13511 : 283 - 299
[5] HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
Dong, Zhen
Yao, Zhewei
Gholami, Amir
Mahoney, Michael W.
Keutzer, Kurt
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 293 - 302
[6] Data Quality-Aware Mixed-Precision Quantization via Hybrid Reinforcement Learning
Wang, Yingchun
Guo, Song
Guo, Jingcai
Zhang, Yuanhong
Zhang, Weizhan
Zheng, Qinghua
Zhang, Jie
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 14
[7] Accuracy vs. Efficiency: Achieving both Through Hardware-Aware Quantization and Reconfigurable Architecture with Mixed Precision
Chang, Libo
Zhang, Shengbing
Du, Huimin
Wang, Shiyu
Qiu, Meikang
Wang, Jihe
19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 151 - 158
[8] Mixed-precision quantization-aware training for photonic neural networks
Kirtas, Manos
Passalis, Nikolaos
Oikonomou, Athina
Moralis-Pegios, Miltos
Giamougiannis, George
Tsakyridis, Apostolos
Mourgias-Alexandris, George
Pleros, Nikolaos
Tefas, Anastasios
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (29): : 21361 - 21379
[9] Mixed-precision quantization-aware training for photonic neural networks
Manos Kirtas
Nikolaos Passalis
Athina Oikonomou
Miltos Moralis-Pegios
George Giamougiannis
Apostolos Tsakyridis
George Mourgias-Alexandris
Nikolaos Pleros
Anastasios Tefas
Neural Computing and Applications, 2023, 35 : 21361 - 21379
[10] Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates
Wang, Xuanda
Fei, Wen
Dai, Wenrui
Li, Chenglin
Zou, Junni
Xiong, Hongkai
2023 DATA COMPRESSION CONFERENCE, DCC, 2023, : 371 - 371

← 1 2 3 4 5 →