MDP: Model Decomposition and Parallelization of Vision Transformer for Distributed Edge Inference

被引：0

作者：

Wang, Weiyan ^{[1
]}

Zhang, Yiming ^{[2
]}

Jin, Yilun ^{[1
]}

Tian, Han ^{[1
]}

Chen, Li ^{[3
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[2] Xiamen Univ, Xiamen, Peoples R China

[3] Zhongguancun Lab, Beijing, Peoples R China

来源：

2023 19TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN 2023 | 2023年

基金：

国家重点研发计划;

关键词：

Distributed edge inference; vision transformers; boosting ensemble;

D O I：

10.1109/MSN60784.2023.00086

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Distributed edge inference emerges to be a promising paradigm to speed up inference. Previous works make physical partitions on CNNs to realize it, but there are the following challenges for vision transformers: (1) high communication costs for the large model; (2) stragglers because of heterogeneous devices; (3) time-out exceptions due to unstable edge devices. Therefore, we propose a novel Model Decomposition and Parallelization(MDP) for large vision transformers. Inspired by the implicit boosting ensemble in the vision transformer, MDP decomposes it into an explicit boosting ensemble of different and parallel sub-models. It sequentially trains all sub-models to gradually reduce the residual errors. To minimize dependency and communication among sub-models, We adopt stacking distillation to bring every sub-model extra information about others for better error correction. Different sub-models can take both different image sizes and model sizes to run on heterogeneous devices and improve the ensemble diversities. To handle the time-out exception, we add vanilla supervised learning on every submodel for the bagging ensemble in case of the early termination of boosting ensemble. As a result, all sub-models can not only run in parallel without much communication but also can be adapted to the heterogeneous devices, while maintaining accuracy even with time-out exceptions. Experiments show that MDP can outperform other baselines by 5.2x similar to 2.1x in latency and 5.1x similar to 1.7x in throughput with comparable accuracy.

引用

页码：570 / 578

页数：9

共 50 条

[31] Automated Exploration and Implementation of Distributed CNN Inference at the Edge
Guo, Xiaotian
Pimentel, Andy D. D.
Stefanov, Todor
IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (07) : 5843 - 5858
[32] Distributed Edge Inference: an Experimental Study on Multiview Detection
Mittone, Gianluca
Malenza, Giulio
Aldinucci, Marco
Birke, Robert
16TH IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING, UCC 2023, 2023,
[33] Hierarchical and Distributed Machine Learning Inference Beyond the Edge
Thomas, Anthony
Guo, Yunhui
Kim, Yeseong
Aksanli, Baris
Kumar, Arun
Rosing, Tajana S.
PROCEEDINGS OF THE 2019 IEEE 16TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC 2019), 2019, : 18 - 23
[34] Distributed Assignment With Load Balancing for DNN Inference at the Edge
Xu, Yuzhe
Mohammed, Thaha
Di Francesco, Mario
Fischione, Carlo
IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (02): : 1053 - 1065
[35] On Model Parallelization and Scheduling Strategies for Distributed Machine Learning
Lee, Seunghak
Kim, Jin Kyu
Zheng, Xun
Ho, Qirong
Gibson, Garth A.
Xing, Eric P.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[36] Vision Transformer-based overlay processor for Edge Computing
Liu, Fang
Fan, Zimeng
Hu, Wei
Xu, Dian
Peng, Min
He, Jing
He, Yanxiang
APPLIED SOFT COMPUTING, 2024, 156
[37] Distributed DNN Inference With Fine-Grained Model Partitioning in Mobile Edge Computing Networks
Li, Hui
Li, Xiuhua
Fan, Qilin
He, Qiang
Wang, Xiaofei
Leung, Victor C. M.
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (10) : 9060 - 9074
[38] DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices
Xu, Guanyu
Hao, Zhiwei
Luo, Yong
Hu, Han
An, Jianping
Mao, Shiwen
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (05) : 5917 - 5932
[39] Semantically Distributed Robust Optimization for Vision-and-Language Inference
Gokhale, Tejas
Chaudhary, Abhishek
Banerjee, Pratyay
Baral, Chitta
Yang, Yezhou
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1493 - 1513
[40] Massive Parallelization of Serial Inference Algorithms for a Complex Generalized Linear Model
Suchard, Marc A.
Simpson, Shawn E.
Zorych, Ivan
Ryan, Patrick
Madigan, David
ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2013, 23 (01):

← 1 2 3 4 5 →