Dynamic Split Computing-Aware Mixed-Precision Quantization for Efficient Deep Edge Intelligence

被引：0

作者：

Nagamatsu, Naoki ^{[1
]}

Hara-Azumi, Yuko ^{[1
]}

机构：

[1] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan

来源：

2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023 | 2024年

关键词：

Deep Neural Networks; Split Computing; Mixed-Precision Quantization; Neural Architecture Search;

D O I：

10.1109/TrustCom60117.2023.00355

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deploying large deep neural networks (DNNs) on IoT and mobile devices poses a significant challenge due to hardware resource limitations. To address this challenge, an edge-cloud integration technique, called split computing (SC), is attractive in improving the inference time by splitting a single DNN model into two sub-models to be processed on an edge device and a server. Dynamic split computing (DSC) is a further emerging technique in SC to dynamically determine the split point depending on the communication conditions. In this work, we propose a DNN architecture optimization method for DSC. Our contributions are twofold. (1) First, we develop a DSC-aware mixed-precision quantization method exploiting neural architecture search (NAS). By NAS, we efficiently explore the optimal bitwidth of each layer from a huge design space to construct potential split points in the target DNN - with the more potential split points, the DNN architecture can more flexibly utilize one split point depending on the communication conditions. (2) Also, in order to improve the end-to-end inference time, we propose a new bitwidth-wise DSC (BW-DSC) algorithm to dynamically determine the optimal split point among the potential split points in the mixed-precision quantized DNN architecture. Our evaluation demonstrated that our work provides more effective split points than existing works while mitigating the inference accuracy degradation. Specifically in terms of the end-to-end inference time, our work achieved an average of 16.47% and up to 24.36% improvement compared with a state-of-the-art work.

引用

页码：2538 / 2545

页数：8

共 44 条

[41] Delay-Aware and Energy-Efficient Computation Offloading in Mobile-Edge Computing Using Deep Reinforcement Learning
Ale, Laha
Zhang, Ning
Fang, Xiaojie
Chen, Xianfu
Wu, Shaohua
Li, Longzhuang
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2021, 7 (03) : 881 - 892
[42] ECQ: An Energy-Efficient, Cost-Effective and Qos-Aware Method for Dynamic Service Migration in Mobile Edge Computing Systems
Ahmed, Awder
Azizi, Sadoon
Zeebaree, Subhi R. M.
WIRELESS PERSONAL COMMUNICATIONS, 2023, 133 (04) : 2467 - 2501
[43] ECQ: An Energy-Efficient, Cost-Effective and Qos-Aware Method for Dynamic Service Migration in Mobile Edge Computing Systems
Awder Ahmed
Sadoon Azizi
Subhi R. M. Zeebaree
Wireless Personal Communications, 2023, 133 : 2467 - 2501
[44] Load-aware dynamic controller placement based on deep reinforcement learning in SDN-enabled mobile cloud-edge computing networks
Xu, Chenglin
Xu, Cheng
Li, Bo
Li, Siqi
Li, Tao
COMPUTER NETWORKS, 2023, 234

← 1 2 3 4 5 →