Dynamic Split Computing-Aware Mixed-Precision Quantization for Efficient Deep Edge Intelligence

被引：0

作者：

Nagamatsu, Naoki ^{[1
]}

Hara-Azumi, Yuko ^{[1
]}

机构：

[1] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan

来源：

2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023 | 2024年

关键词：

Deep Neural Networks; Split Computing; Mixed-Precision Quantization; Neural Architecture Search;

D O I：

10.1109/TrustCom60117.2023.00355

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deploying large deep neural networks (DNNs) on IoT and mobile devices poses a significant challenge due to hardware resource limitations. To address this challenge, an edge-cloud integration technique, called split computing (SC), is attractive in improving the inference time by splitting a single DNN model into two sub-models to be processed on an edge device and a server. Dynamic split computing (DSC) is a further emerging technique in SC to dynamically determine the split point depending on the communication conditions. In this work, we propose a DNN architecture optimization method for DSC. Our contributions are twofold. (1) First, we develop a DSC-aware mixed-precision quantization method exploiting neural architecture search (NAS). By NAS, we efficiently explore the optimal bitwidth of each layer from a huge design space to construct potential split points in the target DNN - with the more potential split points, the DNN architecture can more flexibly utilize one split point depending on the communication conditions. (2) Also, in order to improve the end-to-end inference time, we propose a new bitwidth-wise DSC (BW-DSC) algorithm to dynamically determine the optimal split point among the potential split points in the mixed-precision quantized DNN architecture. Our evaluation demonstrated that our work provides more effective split points than existing works while mitigating the inference accuracy degradation. Specifically in terms of the end-to-end inference time, our work achieved an average of 16.47% and up to 24.36% improvement compared with a state-of-the-art work.

引用

页码：2538 / 2545

页数：8

共 44 条

[11] Entropy-Driven Mixed-Precision Quantization for Deep Network Design
Sun, Zhenhong
Ge, Ce
Wang, Junyan
Lin, Ming
Chen, Hesen
Li, Hao
Sun, Xiuyu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[12] Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization
Balaskas, Konstantinos
Karatzas, Andreas
Sad, Christos
Siozios, Kostas
Anagnostopoulos, Iraklis
Zervakis, Georgios
Henkel, Jorg
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2024, 12 (04) : 1079 - 1092
[13] Data Quality-Aware Mixed-Precision Quantization via Hybrid Reinforcement Learning
Wang, Yingchun
Guo, Song
Guo, Jingcai
Zhang, Yuanhong
Zhang, Weizhan
Zheng, Qinghua
Zhang, Jie
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 14
[14] Hessian-based mixed-precision quantization with transition aware training for neural networks
Huang, Zhiyong
Han, Xiao
Yu, Zhi
Zhao, Yunlan
Hou, Mingyang
Hu, Shengdong
NEURAL NETWORKS, 2025, 182
[15] Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks
Vasquez, Karina
Venkatesha, Yeshwanth
Bhattacharjee, Abhiroop
Moitra, Abhishek
Panda, Priyadarshini
PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1360 - 1365
[16] Constructing energy-efficient mixed-precision neural networks through principal component analysis for edge intelligence
Chakraborty, Indranil
Roy, Deboleena
Garg, Isha
Ankit, Aayush
Roy, Kaushik
NATURE MACHINE INTELLIGENCE, 2020, 2 (01) : 43 - 55
[17] Constructing energy-efficient mixed-precision neural networks through principal component analysis for edge intelligence
Indranil Chakraborty
Deboleena Roy
Isha Garg
Aayush Ankit
Kaushik Roy
Nature Machine Intelligence, 2020, 2 : 43 - 55
[18] DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network
Pei, Songwen
Wang, Jiyao
Zhang, Bingxue
Qin, Wei
Xue, Hai
Ye, Xiaochun
Chen, Mingsong
MACHINE LEARNING, 2024, 113 (07) : 4099 - 4112
[19] Optimizing Information Theory Based Bitwise Bottlenecks for Efficient Mixed-Precision Activation Quantization
Zhou, Xichuan
Liu, Kui
Shi, Cong
Liu, Haijun
Liu, Ji
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3590 - 3598
[20] MPQ-YOLO: Ultra low mixed-precision quantization of YOLO for edge devices deployment
Liu, Xinyu
Wang, Tao
Yang, Jiaming
Tang, Chenwei
Lv, Jiancheng
NEUROCOMPUTING, 2024, 574

← 1 2 3 4 5 →