MDP: Model Decomposition and Parallelization of Vision Transformer for Distributed Edge Inference

被引：0

作者：

Wang, Weiyan ^{[1
]}

Zhang, Yiming ^{[2
]}

Jin, Yilun ^{[1
]}

Tian, Han ^{[1
]}

Chen, Li ^{[3
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[2] Xiamen Univ, Xiamen, Peoples R China

[3] Zhongguancun Lab, Beijing, Peoples R China

来源：

2023 19TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN 2023 | 2023年

基金：

国家重点研发计划;

关键词：

Distributed edge inference; vision transformers; boosting ensemble;

D O I：

10.1109/MSN60784.2023.00086

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Distributed edge inference emerges to be a promising paradigm to speed up inference. Previous works make physical partitions on CNNs to realize it, but there are the following challenges for vision transformers: (1) high communication costs for the large model; (2) stragglers because of heterogeneous devices; (3) time-out exceptions due to unstable edge devices. Therefore, we propose a novel Model Decomposition and Parallelization(MDP) for large vision transformers. Inspired by the implicit boosting ensemble in the vision transformer, MDP decomposes it into an explicit boosting ensemble of different and parallel sub-models. It sequentially trains all sub-models to gradually reduce the residual errors. To minimize dependency and communication among sub-models, We adopt stacking distillation to bring every sub-model extra information about others for better error correction. Different sub-models can take both different image sizes and model sizes to run on heterogeneous devices and improve the ensemble diversities. To handle the time-out exception, we add vanilla supervised learning on every submodel for the bagging ensemble in case of the early termination of boosting ensemble. As a result, all sub-models can not only run in parallel without much communication but also can be adapted to the heterogeneous devices, while maintaining accuracy even with time-out exceptions. Experiments show that MDP can outperform other baselines by 5.2x similar to 2.1x in latency and 5.1x similar to 1.7x in throughput with comparable accuracy.

引用

页码：570 / 578

页数：9

共 50 条

[1] An Autonomous Parallelization of Transformer Model Inference on Heterogeneous Edge Devices
Lee, Juhyeon
Bahk, Insung
Kim, Hoseung
Jeong, Sinjin
Lee, Suyeon
Min, Donghyun
PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024, 2024, : 50 - 61
[2] ViTA: A Vision Transformer Inference Accelerator for Edge Applications
Nag, Shashank
Datta, Gourav
Kundu, Souvik
Chandrachoodan, Nitin
Beerel, Peter A.
2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
[3] When the Edge Meets Transformers: Distributed Inference with Transformer Models
Hu, Chenghao
Li, Baochun
2024 IEEE 44TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS 2024, 2024, : 82 - 92
[4] Model and system robustness in distributed CNN inference at the edge
Guo, Xiaotian
Jiang, Quan
Pimentel, Andy D.
Stefanov, Todor
INTEGRATION-THE VLSI JOURNAL, 2025, 100
[5] Parallelization of a distributed ecohydrological model
Liu, Ning
Shaikh, Mohsin Ahmed
Kala, Jatin
Harper, Richard J.
Dell, Bernard
Liu, Shirong
Sun, Ge
ENVIRONMENTAL MODELLING & SOFTWARE, 2018, 101 : 51 - 63
[6] Transformer Inference Acceleration in Edge Computing Environment
Li, Mingchu
Zhang, Wenteng
Xia, Dexin
2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING WORKSHOPS, CCGRIDW, 2023, : 104 - 109
[7] On Model Coding for Distributed Inference and Transmission in Mobile Edge Computing Systems
Zhang, Jingjing
Simeone, Osvaldo
IEEE COMMUNICATIONS LETTERS, 2019, 23 (06) : 1065 - 1068
[8] MODEL-DISTRIBUTED INFERENCE IN MULTI-SOURCE EDGE NETWORKS
Li, Pengzhen
Seferoglu, Hulya
Koyuncu, Erdem
2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
[9] Adaptive and Resilient Model-Distributed Inference in Edge Computing Systems
Li, Pengzhen
Koyuncu, Erdem
Seferoglu, Hulya
IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY, 2023, 4 : 1263 - 1273
[10] Communication-Efficient Model Parallelism for Distributed In-Situ Transformer Inference
Wei, Yuanxin
Ye, Shengyuan
Jiang, Jiazhi
Chen, Xu
Huang, Dan
Du, Jiangsu
Lu, Yutong
2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,

← 1 2 3 4 5 →