Dynamic Split Computing-Aware Mixed-Precision Quantization for Efficient Deep Edge Intelligence

被引:0
|
作者
Nagamatsu, Naoki [1 ]
Hara-Azumi, Yuko [1 ]
机构
[1] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan
关键词
Deep Neural Networks; Split Computing; Mixed-Precision Quantization; Neural Architecture Search;
D O I
10.1109/TrustCom60117.2023.00355
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deploying large deep neural networks (DNNs) on IoT and mobile devices poses a significant challenge due to hardware resource limitations. To address this challenge, an edge-cloud integration technique, called split computing (SC), is attractive in improving the inference time by splitting a single DNN model into two sub-models to be processed on an edge device and a server. Dynamic split computing (DSC) is a further emerging technique in SC to dynamically determine the split point depending on the communication conditions. In this work, we propose a DNN architecture optimization method for DSC. Our contributions are twofold. (1) First, we develop a DSC-aware mixed-precision quantization method exploiting neural architecture search (NAS). By NAS, we efficiently explore the optimal bitwidth of each layer from a huge design space to construct potential split points in the target DNN - with the more potential split points, the DNN architecture can more flexibly utilize one split point depending on the communication conditions. (2) Also, in order to improve the end-to-end inference time, we propose a new bitwidth-wise DSC (BW-DSC) algorithm to dynamically determine the optimal split point among the potential split points in the mixed-precision quantized DNN architecture. Our evaluation demonstrated that our work provides more effective split points than existing works while mitigating the inference accuracy degradation. Specifically in terms of the end-to-end inference time, our work achieved an average of 16.47% and up to 24.36% improvement compared with a state-of-the-art work.
引用
收藏
页码:2538 / 2545
页数:8
相关论文
共 44 条
  • [11] Entropy-Driven Mixed-Precision Quantization for Deep Network Design
    Sun, Zhenhong
    Ge, Ce
    Wang, Junyan
    Lin, Ming
    Chen, Hesen
    Li, Hao
    Sun, Xiuyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [12] Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization
    Balaskas, Konstantinos
    Karatzas, Andreas
    Sad, Christos
    Siozios, Kostas
    Anagnostopoulos, Iraklis
    Zervakis, Georgios
    Henkel, Jorg
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2024, 12 (04) : 1079 - 1092
  • [13] Data Quality-Aware Mixed-Precision Quantization via Hybrid Reinforcement Learning
    Wang, Yingchun
    Guo, Song
    Guo, Jingcai
    Zhang, Yuanhong
    Zhang, Weizhan
    Zheng, Qinghua
    Zhang, Jie
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 14
  • [14] Hessian-based mixed-precision quantization with transition aware training for neural networks
    Huang, Zhiyong
    Han, Xiao
    Yu, Zhi
    Zhao, Yunlan
    Hou, Mingyang
    Hu, Shengdong
    NEURAL NETWORKS, 2025, 182
  • [15] Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks
    Vasquez, Karina
    Venkatesha, Yeshwanth
    Bhattacharjee, Abhiroop
    Moitra, Abhishek
    Panda, Priyadarshini
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1360 - 1365
  • [16] Constructing energy-efficient mixed-precision neural networks through principal component analysis for edge intelligence
    Chakraborty, Indranil
    Roy, Deboleena
    Garg, Isha
    Ankit, Aayush
    Roy, Kaushik
    NATURE MACHINE INTELLIGENCE, 2020, 2 (01) : 43 - 55
  • [17] Constructing energy-efficient mixed-precision neural networks through principal component analysis for edge intelligence
    Indranil Chakraborty
    Deboleena Roy
    Isha Garg
    Aayush Ankit
    Kaushik Roy
    Nature Machine Intelligence, 2020, 2 : 43 - 55
  • [18] DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network
    Pei, Songwen
    Wang, Jiyao
    Zhang, Bingxue
    Qin, Wei
    Xue, Hai
    Ye, Xiaochun
    Chen, Mingsong
    MACHINE LEARNING, 2024, 113 (07) : 4099 - 4112
  • [19] Optimizing Information Theory Based Bitwise Bottlenecks for Efficient Mixed-Precision Activation Quantization
    Zhou, Xichuan
    Liu, Kui
    Shi, Cong
    Liu, Haijun
    Liu, Ji
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3590 - 3598
  • [20] MPQ-YOLO: Ultra low mixed-precision quantization of YOLO for edge devices deployment
    Liu, Xinyu
    Wang, Tao
    Yang, Jiaming
    Tang, Chenwei
    Lv, Jiancheng
    NEUROCOMPUTING, 2024, 574