Dynamic Split Computing-Aware Mixed-Precision Quantization for Efficient Deep Edge Intelligence

被引:0
|
作者
Nagamatsu, Naoki [1 ]
Hara-Azumi, Yuko [1 ]
机构
[1] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan
关键词
Deep Neural Networks; Split Computing; Mixed-Precision Quantization; Neural Architecture Search;
D O I
10.1109/TrustCom60117.2023.00355
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deploying large deep neural networks (DNNs) on IoT and mobile devices poses a significant challenge due to hardware resource limitations. To address this challenge, an edge-cloud integration technique, called split computing (SC), is attractive in improving the inference time by splitting a single DNN model into two sub-models to be processed on an edge device and a server. Dynamic split computing (DSC) is a further emerging technique in SC to dynamically determine the split point depending on the communication conditions. In this work, we propose a DNN architecture optimization method for DSC. Our contributions are twofold. (1) First, we develop a DSC-aware mixed-precision quantization method exploiting neural architecture search (NAS). By NAS, we efficiently explore the optimal bitwidth of each layer from a huge design space to construct potential split points in the target DNN - with the more potential split points, the DNN architecture can more flexibly utilize one split point depending on the communication conditions. (2) Also, in order to improve the end-to-end inference time, we propose a new bitwidth-wise DSC (BW-DSC) algorithm to dynamically determine the optimal split point among the potential split points in the mixed-precision quantized DNN architecture. Our evaluation demonstrated that our work provides more effective split points than existing works while mitigating the inference accuracy degradation. Specifically in terms of the end-to-end inference time, our work achieved an average of 16.47% and up to 24.36% improvement compared with a state-of-the-art work.
引用
收藏
页码:2538 / 2545
页数:8
相关论文
共 44 条
  • [1] Mixed-Precision Neural Architecture Search and Dynamic Split Point Selection for Split Computing
    Nagamatsu, Naoki
    Ise, Kenshiro
    Hara, Yuko
    IEEE ACCESS, 2024, 12 : 137439 - 137454
  • [2] AMED: Automatic Mixed-Precision Quantization for Edge Devices
    Kimhi, Moshe
    Rozen, Tal
    Mendelson, Avi
    Baskin, Chaim
    MATHEMATICS, 2024, 12 (12)
  • [3] HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
    Dong, Zhen
    Yao, Zhewei
    Gholami, Amir
    Mahoney, Michael W.
    Keutzer, Kurt
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 293 - 302
  • [4] Mixed-precision quantization-aware training for photonic neural networks
    Kirtas, Manos
    Passalis, Nikolaos
    Oikonomou, Athina
    Moralis-Pegios, Miltos
    Giamougiannis, George
    Tsakyridis, Apostolos
    Mourgias-Alexandris, George
    Pleros, Nikolaos
    Tefas, Anastasios
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (29): : 21361 - 21379
  • [5] Mixed-precision quantization-aware training for photonic neural networks
    Manos Kirtas
    Nikolaos Passalis
    Athina Oikonomou
    Miltos Moralis-Pegios
    George Giamougiannis
    Apostolos Tsakyridis
    George Mourgias-Alexandris
    Nikolaos Pleros
    Anastasios Tefas
    Neural Computing and Applications, 2023, 35 : 21361 - 21379
  • [6] MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
    Institute of Automation, Chinese Academy of Sciences, China
    不详
    不详
    不详
    不详
    Proc. Int. Symp. High Perform. Comput. Archit., 1600, (124-138): : 124 - 138
  • [7] Joint Pruning and Channel-Wise Mixed-Precision Quantization for Efficient Deep Neural Networks
    Motetti, Beatrice Alessandra
    Risso, Matteo
    Burrello, Alessio
    Macii, Enrico
    Poncino, Massimo
    Pagliari, Daniele Jahier
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (11) : 2619 - 2633
  • [8] MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
    Zhu, Zeyu
    Li, Fanrong
    Li, Gang
    Liu, Zejian
    Mo, Zitao
    Hu, Qinghao
    Liang, Xiaoyao
    Cheng, Jian
    2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, 2024, : 124 - 138
  • [9] Edge-MPQ: Layer-Wise Mixed-Precision Quantization With Tightly Integrated Versatile Inference Units for Edge Computing
    Zhao, Xiaotian
    Xu, Ruge
    Gao, Yimin
    Verma, Vaibhav
    Stan, Mircea R.
    Guo, Xinfei
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (11) : 2504 - 2519
  • [10] Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates
    Wang, Xuanda
    Fei, Wen
    Dai, Wenrui
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    2023 DATA COMPRESSION CONFERENCE, DCC, 2023, : 371 - 371