Dynamic Split Computing-Aware Mixed-Precision Quantization for Efficient Deep Edge Intelligence

被引：0

作者：

Nagamatsu, Naoki ^{[1
]}

Hara-Azumi, Yuko ^{[1
]}

机构：

[1] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan

来源：

2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023 | 2024年

关键词：

Deep Neural Networks; Split Computing; Mixed-Precision Quantization; Neural Architecture Search;

D O I：

10.1109/TrustCom60117.2023.00355

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deploying large deep neural networks (DNNs) on IoT and mobile devices poses a significant challenge due to hardware resource limitations. To address this challenge, an edge-cloud integration technique, called split computing (SC), is attractive in improving the inference time by splitting a single DNN model into two sub-models to be processed on an edge device and a server. Dynamic split computing (DSC) is a further emerging technique in SC to dynamically determine the split point depending on the communication conditions. In this work, we propose a DNN architecture optimization method for DSC. Our contributions are twofold. (1) First, we develop a DSC-aware mixed-precision quantization method exploiting neural architecture search (NAS). By NAS, we efficiently explore the optimal bitwidth of each layer from a huge design space to construct potential split points in the target DNN - with the more potential split points, the DNN architecture can more flexibly utilize one split point depending on the communication conditions. (2) Also, in order to improve the end-to-end inference time, we propose a new bitwidth-wise DSC (BW-DSC) algorithm to dynamically determine the optimal split point among the potential split points in the mixed-precision quantized DNN architecture. Our evaluation demonstrated that our work provides more effective split points than existing works while mitigating the inference accuracy degradation. Specifically in terms of the end-to-end inference time, our work achieved an average of 16.47% and up to 24.36% improvement compared with a state-of-the-art work.

引用

页码：2538 / 2545

页数：8

共 44 条

[21] CMQ: Crossbar-Aware Neural Network Mixed-Precision Quantization via Differentiable Architecture Search
Peng, Jie
Liu, Haijun
Zhao, Zhongjin
Li, Zhiwei
Liu, Sen
Li, Qingjiang
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (11) : 4124 - 4133
[22] Design Space Exploration of Layer-Wise Mixed-Precision Quantization with Tightly Integrated Edge Inference Units
Zhao, Xiaotian
Gao, Yimin
Verma, Vaibhav
Xu, Ruge
Stan, Mircea
Guo, Xinfei
PROCEEDINGS OF THE GREAT LAKES SYMPOSIUM ON VLSI 2023, GLSVLSI 2023, 2023, : 467 - 471
[23] Highly-Adaptive Mixed-Precision MAC Unit for Smart and Low-Power Edge Computing
Devic, Guillaume
France-Pillois, Maxime
Salles, Jeremie
Sassatelli, Gilles
Gamatie, Abdoulaye
2021 19TH IEEE INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS), 2021,
[24] Quantization aware approximate multiplier and hardware accelerator for edge computing of deep learning applications
Reddy, K. Manikantta
Vasantha, M. H.
Kumar, Y. B. Nithin
Gopal, Ch. Keshava
Dwivedi, Devesh
INTEGRATION-THE VLSI JOURNAL, 2021, 81 : 268 - 279
[25] Mix-GEMM: An efficient HW-SW Architecture for Mixed-Precision Quantized Deep Neural Networks Inference on Edge Devices
Reggiani, Enrico
Pappalardo, Alessandro
Doblas, Max
Moreto, Miquel
Olivieri, Mauro
Unsal, Osman Sabri
Cristal, Adrian
2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 1085 - 1098
[26] Efficient Dynamic Reconfigurable CNN Accelerator for Edge Intelligence Computing on FPGA
Shi, Kaisheng
Wang, Mingwei
Tan, Xin
Li, Qianghua
Lei, Tao
INFORMATION, 2023, 14 (03)
[27] Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge
Lu, Yao
Rodriguez, Hiram Rayo Torres
Vogel, Sebastian
van de Waterlaat, Nick
Jancura, Pavol
PROCEEDINGS 2023 IEEE/ACM INTERNATIONAL WORKSHOP ON COMPILERS, DEPLOYMENT, AND TOOLING FOR EDGE AI, CODAI 2023, 2023, : 1 - 5
[28] A 12.1 TOPS/W Mixed-precision Quantized Deep Convolutional Neural Network Accelerator for Low Power on Edge / Endpoint Device
Isono, Takanori
Yamakura, Makoto
Shimaya, Satoshi
Kawamoto, Isao
Tsuboi, Nobuhiro
Mineo, Masaaki
Nakajima, Wataru
Ishida, Kenichi
Sasaki, Shin
Higuchi, Toshio
Hoshaku, Masahiro
Murakami, Daisuke
Iwasaki, Toshifumi
Hirai, Hiroshi
2020 IEEE ASIAN SOLID-STATE CIRCUITS CONFERENCE (A-SSCC), 2020,
[29] Efficient Fixed/Floating-Point Merged Mixed-Precision Multiply-Accumulate Unit for Deep Learning Processors
Zhang, Hao
Lee, Hyuk Jae
Ko, Seok-Bum
2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
[30] A Dynamic Execution Neural Network Processor for Fine-Grained Mixed-Precision Model Training Based on Online Quantization Sensitivity Analysis
Liu, Ruoyang
Wei, Chenhan
Yang, Yixiong
Wang, Wenxun
Yuan, Binbin
Yang, Huazhong
Liu, Yongpan
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2024, 59 (09) : 3082 - 3093

← 1 2 3 4 5 →