Dynamic Split Computing-Aware Mixed-Precision Quantization for Efficient Deep Edge Intelligence

被引：0

作者：

Nagamatsu, Naoki ^{[1
]}

Hara-Azumi, Yuko ^{[1
]}

机构：

[1] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan

来源：

2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023 | 2024年

关键词：

Deep Neural Networks; Split Computing; Mixed-Precision Quantization; Neural Architecture Search;

D O I：

10.1109/TrustCom60117.2023.00355

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deploying large deep neural networks (DNNs) on IoT and mobile devices poses a significant challenge due to hardware resource limitations. To address this challenge, an edge-cloud integration technique, called split computing (SC), is attractive in improving the inference time by splitting a single DNN model into two sub-models to be processed on an edge device and a server. Dynamic split computing (DSC) is a further emerging technique in SC to dynamically determine the split point depending on the communication conditions. In this work, we propose a DNN architecture optimization method for DSC. Our contributions are twofold. (1) First, we develop a DSC-aware mixed-precision quantization method exploiting neural architecture search (NAS). By NAS, we efficiently explore the optimal bitwidth of each layer from a huge design space to construct potential split points in the target DNN - with the more potential split points, the DNN architecture can more flexibly utilize one split point depending on the communication conditions. (2) Also, in order to improve the end-to-end inference time, we propose a new bitwidth-wise DSC (BW-DSC) algorithm to dynamically determine the optimal split point among the potential split points in the mixed-precision quantized DNN architecture. Our evaluation demonstrated that our work provides more effective split points than existing works while mitigating the inference accuracy degradation. Specifically in terms of the end-to-end inference time, our work achieved an average of 16.47% and up to 24.36% improvement compared with a state-of-the-art work.

引用

页码：2538 / 2545

页数：8

共 44 条

[31] Edge intelligence in motion: Mobility-aware dynamic DNN inference service migration with downtime in mobile edge computing
Wang, Pu
Ouyang, Tao
Liao, Guocheng
Gong, Jie
Yu, Shuai
Chen, Xu
Journal of Systems Architecture, 2022, 130
[32] A Dynamic Deep Neural Network Design for Efficient Workload Allocation in Edge Computing
Lo, Chi
Su, Yu-Yi
Lee, Chun-Yi
Chang, Shih-Chieh
2017 IEEE 35TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2017, : 273 - 280
[33] Edge intelligence in motion: Mobility-aware dynamic DNN inference service migration with downtime in mobile edge computing
Wang, Pu
Ouyang, Tao
Liao, Guocheng
Gong, Jie
Yu, Shuai
Chen, Xu
JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 130
[34] Efficient resource assignment in mobile edge computing: A dynamic congestion-aware offloading approach
Guo, Kai
Yang, Mingcong
Zhang, Yongbing
Jia, Xiaohua
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2019, 134 : 40 - 51
[35] EPtask: Deep Reinforcement Learning Based Energy-Efficient and Priority-Aware Task Scheduling for Dynamic Vehicular Edge Computing
Li, Peisong
Xiao, Ziren
Wang, Xinheng
Huang, Kaizhu
Huang, Yi
Gao, Honghao
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 1830 - 1846
[36] Energy-Efficient Intelligence Sharing in Intelligence Networking-Empowered Edge Computing: A Deep Reinforcement Learning Approach
Xie, Junfeng
Jia, Qingmin
Chen, Youxing
IEEE ACCESS, 2024, 12 : 141639 - 141652
[37] An Energy-Efficient Deep Reinforcement Learning FPGA Accelerator for Online Fast Adaptation with Selective Mixed-precision Re-training
Jo, Wooyoung
Lee, Juhyoung
Park, Seunghyun
Yoo, Hoi-Jun
IEEE ASIAN SOLID-STATE CIRCUITS CONFERENCE (A-SSCC 2021), 2021,
[38] An Energy-Efficient Dynamic Offloading Algorithm for Edge Computing Based on Deep Reinforcement Learning
Zhu, Keyu
Li, Shaobo
Zhang, Xingxing
Wang, Jinming
Xie, Cankun
Wu, Fengbin
Xie, Rongxiang
IEEE ACCESS, 2024, 12 : 127489 - 127506
[39] Dependency-Aware Dynamic Task Offloading Based on Deep Reinforcement Learning in Mobile-Edge Computing
Fang, Juan
Qu, Dezheng
Chen, Huijie
Liu, Yaqi
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (02): : 1403 - 1415
[40] LADDER: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation
Wang, Lei
Ma, Lingxiao
Cao, Shijie
Zhang, Quanlu
Xue, Jilong
Shi, Yining
Zheng, Ningxin
Miao, Ziming
Yang, Fan
Cao, Ting
Yang, Yuqing
Yang, Mao
PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 307 - 323

← 1 2 3 4 5 →