Dynamic Split Computing-Aware Mixed-Precision Quantization for Efficient Deep Edge Intelligence

被引：0

作者：

Nagamatsu, Naoki ^{[1
]}

Hara-Azumi, Yuko ^{[1
]}

机构：

[1] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan

来源：

2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023 | 2024年

关键词：

Deep Neural Networks; Split Computing; Mixed-Precision Quantization; Neural Architecture Search;

D O I：

10.1109/TrustCom60117.2023.00355

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deploying large deep neural networks (DNNs) on IoT and mobile devices poses a significant challenge due to hardware resource limitations. To address this challenge, an edge-cloud integration technique, called split computing (SC), is attractive in improving the inference time by splitting a single DNN model into two sub-models to be processed on an edge device and a server. Dynamic split computing (DSC) is a further emerging technique in SC to dynamically determine the split point depending on the communication conditions. In this work, we propose a DNN architecture optimization method for DSC. Our contributions are twofold. (1) First, we develop a DSC-aware mixed-precision quantization method exploiting neural architecture search (NAS). By NAS, we efficiently explore the optimal bitwidth of each layer from a huge design space to construct potential split points in the target DNN - with the more potential split points, the DNN architecture can more flexibly utilize one split point depending on the communication conditions. (2) Also, in order to improve the end-to-end inference time, we propose a new bitwidth-wise DSC (BW-DSC) algorithm to dynamically determine the optimal split point among the potential split points in the mixed-precision quantized DNN architecture. Our evaluation demonstrated that our work provides more effective split points than existing works while mitigating the inference accuracy degradation. Specifically in terms of the end-to-end inference time, our work achieved an average of 16.47% and up to 24.36% improvement compared with a state-of-the-art work.

引用

页码：2538 / 2545

页数：8

共 44 条

[1] Mixed-Precision Neural Architecture Search and Dynamic Split Point Selection for Split Computing
Nagamatsu, Naoki
Ise, Kenshiro
Hara, Yuko
IEEE ACCESS, 2024, 12 : 137439 - 137454
[2] AMED: Automatic Mixed-Precision Quantization for Edge Devices
Kimhi, Moshe
Rozen, Tal
Mendelson, Avi
Baskin, Chaim
MATHEMATICS, 2024, 12 (12)
[3] HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
Dong, Zhen
Yao, Zhewei
Gholami, Amir
Mahoney, Michael W.
Keutzer, Kurt
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 293 - 302
[4] Mixed-precision quantization-aware training for photonic neural networks
Kirtas, Manos
Passalis, Nikolaos
Oikonomou, Athina
Moralis-Pegios, Miltos
Giamougiannis, George
Tsakyridis, Apostolos
Mourgias-Alexandris, George
Pleros, Nikolaos
Tefas, Anastasios
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (29): : 21361 - 21379
[5] Mixed-precision quantization-aware training for photonic neural networks
Manos Kirtas
Nikolaos Passalis
Athina Oikonomou
Miltos Moralis-Pegios
George Giamougiannis
Apostolos Tsakyridis
George Mourgias-Alexandris
Nikolaos Pleros
Anastasios Tefas
Neural Computing and Applications, 2023, 35 : 21361 - 21379
[6] MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
Institute of Automation, Chinese Academy of Sciences, China
不详
不详
不详
不详
Proc. Int. Symp. High Perform. Comput. Archit., 1600, (124-138): : 124 - 138
[7] Joint Pruning and Channel-Wise Mixed-Precision Quantization for Efficient Deep Neural Networks
Motetti, Beatrice Alessandra
Risso, Matteo
Burrello, Alessio
Macii, Enrico
Poncino, Massimo
Pagliari, Daniele Jahier
IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (11) : 2619 - 2633
[8] MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
Zhu, Zeyu
Li, Fanrong
Li, Gang
Liu, Zejian
Mo, Zitao
Hu, Qinghao
Liang, Xiaoyao
Cheng, Jian
2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, 2024, : 124 - 138
[9] Edge-MPQ: Layer-Wise Mixed-Precision Quantization With Tightly Integrated Versatile Inference Units for Edge Computing
Zhao, Xiaotian
Xu, Ruge
Gao, Yimin
Verma, Vaibhav
Stan, Mircea R.
Guo, Xinfei
IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (11) : 2504 - 2519
[10] Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates
Wang, Xuanda
Fei, Wen
Dai, Wenrui
Li, Chenglin
Zou, Junni
Xiong, Hongkai
2023 DATA COMPRESSION CONFERENCE, DCC, 2023, : 371 - 371

← 1 2 3 4 5 →