Dynamic Split Computing-Aware Mixed-Precision Quantization for Efficient Deep Edge Intelligence

被引:0
|
作者
Nagamatsu, Naoki [1 ]
Hara-Azumi, Yuko [1 ]
机构
[1] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan
关键词
Deep Neural Networks; Split Computing; Mixed-Precision Quantization; Neural Architecture Search;
D O I
10.1109/TrustCom60117.2023.00355
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deploying large deep neural networks (DNNs) on IoT and mobile devices poses a significant challenge due to hardware resource limitations. To address this challenge, an edge-cloud integration technique, called split computing (SC), is attractive in improving the inference time by splitting a single DNN model into two sub-models to be processed on an edge device and a server. Dynamic split computing (DSC) is a further emerging technique in SC to dynamically determine the split point depending on the communication conditions. In this work, we propose a DNN architecture optimization method for DSC. Our contributions are twofold. (1) First, we develop a DSC-aware mixed-precision quantization method exploiting neural architecture search (NAS). By NAS, we efficiently explore the optimal bitwidth of each layer from a huge design space to construct potential split points in the target DNN - with the more potential split points, the DNN architecture can more flexibly utilize one split point depending on the communication conditions. (2) Also, in order to improve the end-to-end inference time, we propose a new bitwidth-wise DSC (BW-DSC) algorithm to dynamically determine the optimal split point among the potential split points in the mixed-precision quantized DNN architecture. Our evaluation demonstrated that our work provides more effective split points than existing works while mitigating the inference accuracy degradation. Specifically in terms of the end-to-end inference time, our work achieved an average of 16.47% and up to 24.36% improvement compared with a state-of-the-art work.
引用
收藏
页码:2538 / 2545
页数:8
相关论文
共 44 条
  • [41] Delay-Aware and Energy-Efficient Computation Offloading in Mobile-Edge Computing Using Deep Reinforcement Learning
    Ale, Laha
    Zhang, Ning
    Fang, Xiaojie
    Chen, Xianfu
    Wu, Shaohua
    Li, Longzhuang
    IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2021, 7 (03) : 881 - 892
  • [42] ECQ: An Energy-Efficient, Cost-Effective and Qos-Aware Method for Dynamic Service Migration in Mobile Edge Computing Systems
    Ahmed, Awder
    Azizi, Sadoon
    Zeebaree, Subhi R. M.
    WIRELESS PERSONAL COMMUNICATIONS, 2023, 133 (04) : 2467 - 2501
  • [43] ECQ: An Energy-Efficient, Cost-Effective and Qos-Aware Method for Dynamic Service Migration in Mobile Edge Computing Systems
    Awder Ahmed
    Sadoon Azizi
    Subhi R. M. Zeebaree
    Wireless Personal Communications, 2023, 133 : 2467 - 2501
  • [44] Load-aware dynamic controller placement based on deep reinforcement learning in SDN-enabled mobile cloud-edge computing networks
    Xu, Chenglin
    Xu, Cheng
    Li, Bo
    Li, Siqi
    Li, Tao
    COMPUTER NETWORKS, 2023, 234