Dynamic Split Computing-Aware Mixed-Precision Quantization for Efficient Deep Edge Intelligence

被引:0
|
作者
Nagamatsu, Naoki [1 ]
Hara-Azumi, Yuko [1 ]
机构
[1] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan
关键词
Deep Neural Networks; Split Computing; Mixed-Precision Quantization; Neural Architecture Search;
D O I
10.1109/TrustCom60117.2023.00355
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deploying large deep neural networks (DNNs) on IoT and mobile devices poses a significant challenge due to hardware resource limitations. To address this challenge, an edge-cloud integration technique, called split computing (SC), is attractive in improving the inference time by splitting a single DNN model into two sub-models to be processed on an edge device and a server. Dynamic split computing (DSC) is a further emerging technique in SC to dynamically determine the split point depending on the communication conditions. In this work, we propose a DNN architecture optimization method for DSC. Our contributions are twofold. (1) First, we develop a DSC-aware mixed-precision quantization method exploiting neural architecture search (NAS). By NAS, we efficiently explore the optimal bitwidth of each layer from a huge design space to construct potential split points in the target DNN - with the more potential split points, the DNN architecture can more flexibly utilize one split point depending on the communication conditions. (2) Also, in order to improve the end-to-end inference time, we propose a new bitwidth-wise DSC (BW-DSC) algorithm to dynamically determine the optimal split point among the potential split points in the mixed-precision quantized DNN architecture. Our evaluation demonstrated that our work provides more effective split points than existing works while mitigating the inference accuracy degradation. Specifically in terms of the end-to-end inference time, our work achieved an average of 16.47% and up to 24.36% improvement compared with a state-of-the-art work.
引用
收藏
页码:2538 / 2545
页数:8
相关论文
共 44 条
  • [21] CMQ: Crossbar-Aware Neural Network Mixed-Precision Quantization via Differentiable Architecture Search
    Peng, Jie
    Liu, Haijun
    Zhao, Zhongjin
    Li, Zhiwei
    Liu, Sen
    Li, Qingjiang
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (11) : 4124 - 4133
  • [22] Design Space Exploration of Layer-Wise Mixed-Precision Quantization with Tightly Integrated Edge Inference Units
    Zhao, Xiaotian
    Gao, Yimin
    Verma, Vaibhav
    Xu, Ruge
    Stan, Mircea
    Guo, Xinfei
    PROCEEDINGS OF THE GREAT LAKES SYMPOSIUM ON VLSI 2023, GLSVLSI 2023, 2023, : 467 - 471
  • [23] Highly-Adaptive Mixed-Precision MAC Unit for Smart and Low-Power Edge Computing
    Devic, Guillaume
    France-Pillois, Maxime
    Salles, Jeremie
    Sassatelli, Gilles
    Gamatie, Abdoulaye
    2021 19TH IEEE INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS), 2021,
  • [24] Quantization aware approximate multiplier and hardware accelerator for edge computing of deep learning applications
    Reddy, K. Manikantta
    Vasantha, M. H.
    Kumar, Y. B. Nithin
    Gopal, Ch. Keshava
    Dwivedi, Devesh
    INTEGRATION-THE VLSI JOURNAL, 2021, 81 : 268 - 279
  • [25] Mix-GEMM: An efficient HW-SW Architecture for Mixed-Precision Quantized Deep Neural Networks Inference on Edge Devices
    Reggiani, Enrico
    Pappalardo, Alessandro
    Doblas, Max
    Moreto, Miquel
    Olivieri, Mauro
    Unsal, Osman Sabri
    Cristal, Adrian
    2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 1085 - 1098
  • [26] Efficient Dynamic Reconfigurable CNN Accelerator for Edge Intelligence Computing on FPGA
    Shi, Kaisheng
    Wang, Mingwei
    Tan, Xin
    Li, Qianghua
    Lei, Tao
    INFORMATION, 2023, 14 (03)
  • [27] Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge
    Lu, Yao
    Rodriguez, Hiram Rayo Torres
    Vogel, Sebastian
    van de Waterlaat, Nick
    Jancura, Pavol
    PROCEEDINGS 2023 IEEE/ACM INTERNATIONAL WORKSHOP ON COMPILERS, DEPLOYMENT, AND TOOLING FOR EDGE AI, CODAI 2023, 2023, : 1 - 5
  • [28] A 12.1 TOPS/W Mixed-precision Quantized Deep Convolutional Neural Network Accelerator for Low Power on Edge / Endpoint Device
    Isono, Takanori
    Yamakura, Makoto
    Shimaya, Satoshi
    Kawamoto, Isao
    Tsuboi, Nobuhiro
    Mineo, Masaaki
    Nakajima, Wataru
    Ishida, Kenichi
    Sasaki, Shin
    Higuchi, Toshio
    Hoshaku, Masahiro
    Murakami, Daisuke
    Iwasaki, Toshifumi
    Hirai, Hiroshi
    2020 IEEE ASIAN SOLID-STATE CIRCUITS CONFERENCE (A-SSCC), 2020,
  • [29] Efficient Fixed/Floating-Point Merged Mixed-Precision Multiply-Accumulate Unit for Deep Learning Processors
    Zhang, Hao
    Lee, Hyuk Jae
    Ko, Seok-Bum
    2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
  • [30] A Dynamic Execution Neural Network Processor for Fine-Grained Mixed-Precision Model Training Based on Online Quantization Sensitivity Analysis
    Liu, Ruoyang
    Wei, Chenhan
    Yang, Yixiong
    Wang, Wenxun
    Yuan, Binbin
    Yang, Huazhong
    Liu, Yongpan
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2024, 59 (09) : 3082 - 3093