Dynamic Split Computing-Aware Mixed-Precision Quantization for Efficient Deep Edge Intelligence

被引:0
|
作者
Nagamatsu, Naoki [1 ]
Hara-Azumi, Yuko [1 ]
机构
[1] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan
关键词
Deep Neural Networks; Split Computing; Mixed-Precision Quantization; Neural Architecture Search;
D O I
10.1109/TrustCom60117.2023.00355
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deploying large deep neural networks (DNNs) on IoT and mobile devices poses a significant challenge due to hardware resource limitations. To address this challenge, an edge-cloud integration technique, called split computing (SC), is attractive in improving the inference time by splitting a single DNN model into two sub-models to be processed on an edge device and a server. Dynamic split computing (DSC) is a further emerging technique in SC to dynamically determine the split point depending on the communication conditions. In this work, we propose a DNN architecture optimization method for DSC. Our contributions are twofold. (1) First, we develop a DSC-aware mixed-precision quantization method exploiting neural architecture search (NAS). By NAS, we efficiently explore the optimal bitwidth of each layer from a huge design space to construct potential split points in the target DNN - with the more potential split points, the DNN architecture can more flexibly utilize one split point depending on the communication conditions. (2) Also, in order to improve the end-to-end inference time, we propose a new bitwidth-wise DSC (BW-DSC) algorithm to dynamically determine the optimal split point among the potential split points in the mixed-precision quantized DNN architecture. Our evaluation demonstrated that our work provides more effective split points than existing works while mitigating the inference accuracy degradation. Specifically in terms of the end-to-end inference time, our work achieved an average of 16.47% and up to 24.36% improvement compared with a state-of-the-art work.
引用
收藏
页码:2538 / 2545
页数:8
相关论文
共 44 条
  • [31] Edge intelligence in motion: Mobility-aware dynamic DNN inference service migration with downtime in mobile edge computing
    Wang, Pu
    Ouyang, Tao
    Liao, Guocheng
    Gong, Jie
    Yu, Shuai
    Chen, Xu
    Journal of Systems Architecture, 2022, 130
  • [32] A Dynamic Deep Neural Network Design for Efficient Workload Allocation in Edge Computing
    Lo, Chi
    Su, Yu-Yi
    Lee, Chun-Yi
    Chang, Shih-Chieh
    2017 IEEE 35TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2017, : 273 - 280
  • [33] Edge intelligence in motion: Mobility-aware dynamic DNN inference service migration with downtime in mobile edge computing
    Wang, Pu
    Ouyang, Tao
    Liao, Guocheng
    Gong, Jie
    Yu, Shuai
    Chen, Xu
    JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 130
  • [34] Efficient resource assignment in mobile edge computing: A dynamic congestion-aware offloading approach
    Guo, Kai
    Yang, Mingcong
    Zhang, Yongbing
    Jia, Xiaohua
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2019, 134 : 40 - 51
  • [35] EPtask: Deep Reinforcement Learning Based Energy-Efficient and Priority-Aware Task Scheduling for Dynamic Vehicular Edge Computing
    Li, Peisong
    Xiao, Ziren
    Wang, Xinheng
    Huang, Kaizhu
    Huang, Yi
    Gao, Honghao
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 1830 - 1846
  • [36] Energy-Efficient Intelligence Sharing in Intelligence Networking-Empowered Edge Computing: A Deep Reinforcement Learning Approach
    Xie, Junfeng
    Jia, Qingmin
    Chen, Youxing
    IEEE ACCESS, 2024, 12 : 141639 - 141652
  • [37] An Energy-Efficient Deep Reinforcement Learning FPGA Accelerator for Online Fast Adaptation with Selective Mixed-precision Re-training
    Jo, Wooyoung
    Lee, Juhyoung
    Park, Seunghyun
    Yoo, Hoi-Jun
    IEEE ASIAN SOLID-STATE CIRCUITS CONFERENCE (A-SSCC 2021), 2021,
  • [38] An Energy-Efficient Dynamic Offloading Algorithm for Edge Computing Based on Deep Reinforcement Learning
    Zhu, Keyu
    Li, Shaobo
    Zhang, Xingxing
    Wang, Jinming
    Xie, Cankun
    Wu, Fengbin
    Xie, Rongxiang
    IEEE ACCESS, 2024, 12 : 127489 - 127506
  • [39] Dependency-Aware Dynamic Task Offloading Based on Deep Reinforcement Learning in Mobile-Edge Computing
    Fang, Juan
    Qu, Dezheng
    Chen, Huijie
    Liu, Yaqi
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (02): : 1403 - 1415
  • [40] LADDER: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation
    Wang, Lei
    Ma, Lingxiao
    Cao, Shijie
    Zhang, Quanlu
    Xue, Jilong
    Shi, Yining
    Zheng, Ningxin
    Miao, Ziming
    Yang, Fan
    Cao, Ting
    Yang, Yuqing
    Yang, Mao
    PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 307 - 323