MDP: Model Decomposition and Parallelization of Vision Transformer for Distributed Edge Inference

被引：0

作者：

Wang, Weiyan ^{[1
]}

Zhang, Yiming ^{[2
]}

Jin, Yilun ^{[1
]}

Tian, Han ^{[1
]}

Chen, Li ^{[3
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[2] Xiamen Univ, Xiamen, Peoples R China

[3] Zhongguancun Lab, Beijing, Peoples R China

来源：

2023 19TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN 2023 | 2023年

基金：

国家重点研发计划;

关键词：

Distributed edge inference; vision transformers; boosting ensemble;

D O I：

10.1109/MSN60784.2023.00086

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Distributed edge inference emerges to be a promising paradigm to speed up inference. Previous works make physical partitions on CNNs to realize it, but there are the following challenges for vision transformers: (1) high communication costs for the large model; (2) stragglers because of heterogeneous devices; (3) time-out exceptions due to unstable edge devices. Therefore, we propose a novel Model Decomposition and Parallelization(MDP) for large vision transformers. Inspired by the implicit boosting ensemble in the vision transformer, MDP decomposes it into an explicit boosting ensemble of different and parallel sub-models. It sequentially trains all sub-models to gradually reduce the residual errors. To minimize dependency and communication among sub-models, We adopt stacking distillation to bring every sub-model extra information about others for better error correction. Different sub-models can take both different image sizes and model sizes to run on heterogeneous devices and improve the ensemble diversities. To handle the time-out exception, we add vanilla supervised learning on every submodel for the bagging ensemble in case of the early termination of boosting ensemble. As a result, all sub-models can not only run in parallel without much communication but also can be adapted to the heterogeneous devices, while maintaining accuracy even with time-out exceptions. Experiments show that MDP can outperform other baselines by 5.2x similar to 2.1x in latency and 5.1x similar to 1.7x in throughput with comparable accuracy.

引用

页码：570 / 578

页数：9

共 50 条

[21] Vision transformer models for mobile/edge devices: a survey
Lee, Seung Il
Koo, Kwanghyun
Lee, Jong Ho
Lee, Gilha
Jeong, Sangbeom
Seongjun, O.
Kim, Hyun
MULTIMEDIA SYSTEMS, 2024, 30 (02)
[22] Diffusely Distributed Parallelization of MOEA/D with Edge Weight Vectors Sharing
Sato, Yuji
Midtlyng, Mads
Sato, Mikiko
PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 411 - 414
[23] FedViT: Federated continual learning of vision transformer at edge
Zuo, Xiaojiang
Luopan, Yaxin
Han, Rui
Zhang, Qinglong
Liu, Chi Harold
Wang, Guoren
Chen, Lydia Y.
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 154 : 1 - 15
[24] Comparing Domain Decomposition Methods for the Parallelization of Distributed Land Surface Models
von Ramm, Alexander
Weismueller, Jens
Kurtz, Wolfgang
Neckel, Tobias
COMPUTATIONAL SCIENCE - ICCS 2019, PT I, 2019, 11536 : 197 - 210
[25] LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
Graham, Ben
El-Nouby, Alaaeldin
Touvron, Hugo
Stock, Pierre
Joulin, Armand
Jegou, Herve
Douze, Matthijs
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12239 - 12249
[26] Failure-Resilient Distributed Inference With Model Compression Over Heterogeneous Edge Devices
Wang, Li
Li, Liang
Xu, Lianming
Peng, Xian
Fei, Aiguo
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 12680 - 12692
[27] Low Latency Deep Learning Inference Model for Distributed Intelligent IoT Edge Clusters
Naveen, Soumyalatha
Kounte, Manjunath R.
Ahmed, Mohammed Riyaz
IEEE ACCESS, 2021, 9 : 160607 - 160621
[28] Real-time Transformer Inference on Edge AI Accelerators
Reidy, Brendan
Mohammadi, Mohammadreza
Elbtity, Mohammed
Smith, Heath
Zand, Ramtin
2023 IEEE 29TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM, RTAS, 2023, : 341 - 344
[29] A COMPILER FOR A DISTRIBUTED INFERENCE MODEL
PERCEBOIS, C
SIGNES, N
AGNOLETTO, P
LECTURE NOTES IN COMPUTER SCIENCE, 1991, 487 : 412 - 421
[30] Traffic Control Model and Algorithm Based on Decomposition of MDP
Yin, Biao
Dridi, Mahjoub
El Moudni, Abdellah
2014 INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT), 2014, : 225 - 230

← 1 2 3 4 5 →