MDP: Model Decomposition and Parallelization of Vision Transformer for Distributed Edge Inference

被引:0
|
作者
Wang, Weiyan [1 ]
Zhang, Yiming [2 ]
Jin, Yilun [1 ]
Tian, Han [1 ]
Chen, Li [3 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[2] Xiamen Univ, Xiamen, Peoples R China
[3] Zhongguancun Lab, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
Distributed edge inference; vision transformers; boosting ensemble;
D O I
10.1109/MSN60784.2023.00086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed edge inference emerges to be a promising paradigm to speed up inference. Previous works make physical partitions on CNNs to realize it, but there are the following challenges for vision transformers: (1) high communication costs for the large model; (2) stragglers because of heterogeneous devices; (3) time-out exceptions due to unstable edge devices. Therefore, we propose a novel Model Decomposition and Parallelization(MDP) for large vision transformers. Inspired by the implicit boosting ensemble in the vision transformer, MDP decomposes it into an explicit boosting ensemble of different and parallel sub-models. It sequentially trains all sub-models to gradually reduce the residual errors. To minimize dependency and communication among sub-models, We adopt stacking distillation to bring every sub-model extra information about others for better error correction. Different sub-models can take both different image sizes and model sizes to run on heterogeneous devices and improve the ensemble diversities. To handle the time-out exception, we add vanilla supervised learning on every submodel for the bagging ensemble in case of the early termination of boosting ensemble. As a result, all sub-models can not only run in parallel without much communication but also can be adapted to the heterogeneous devices, while maintaining accuracy even with time-out exceptions. Experiments show that MDP can outperform other baselines by 5.2x similar to 2.1x in latency and 5.1x similar to 1.7x in throughput with comparable accuracy.
引用
收藏
页码:570 / 578
页数:9
相关论文
共 50 条
  • [1] An Autonomous Parallelization of Transformer Model Inference on Heterogeneous Edge Devices
    Lee, Juhyeon
    Bahk, Insung
    Kim, Hoseung
    Jeong, Sinjin
    Lee, Suyeon
    Min, Donghyun
    PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024, 2024, : 50 - 61
  • [2] ViTA: A Vision Transformer Inference Accelerator for Edge Applications
    Nag, Shashank
    Datta, Gourav
    Kundu, Souvik
    Chandrachoodan, Nitin
    Beerel, Peter A.
    2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [3] When the Edge Meets Transformers: Distributed Inference with Transformer Models
    Hu, Chenghao
    Li, Baochun
    2024 IEEE 44TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS 2024, 2024, : 82 - 92
  • [4] Model and system robustness in distributed CNN inference at the edge
    Guo, Xiaotian
    Jiang, Quan
    Pimentel, Andy D.
    Stefanov, Todor
    INTEGRATION-THE VLSI JOURNAL, 2025, 100
  • [5] Parallelization of a distributed ecohydrological model
    Liu, Ning
    Shaikh, Mohsin Ahmed
    Kala, Jatin
    Harper, Richard J.
    Dell, Bernard
    Liu, Shirong
    Sun, Ge
    ENVIRONMENTAL MODELLING & SOFTWARE, 2018, 101 : 51 - 63
  • [6] Transformer Inference Acceleration in Edge Computing Environment
    Li, Mingchu
    Zhang, Wenteng
    Xia, Dexin
    2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING WORKSHOPS, CCGRIDW, 2023, : 104 - 109
  • [7] On Model Coding for Distributed Inference and Transmission in Mobile Edge Computing Systems
    Zhang, Jingjing
    Simeone, Osvaldo
    IEEE COMMUNICATIONS LETTERS, 2019, 23 (06) : 1065 - 1068
  • [8] MODEL-DISTRIBUTED INFERENCE IN MULTI-SOURCE EDGE NETWORKS
    Li, Pengzhen
    Seferoglu, Hulya
    Koyuncu, Erdem
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [9] Adaptive and Resilient Model-Distributed Inference in Edge Computing Systems
    Li, Pengzhen
    Koyuncu, Erdem
    Seferoglu, Hulya
    IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY, 2023, 4 : 1263 - 1273
  • [10] Communication-Efficient Model Parallelism for Distributed In-Situ Transformer Inference
    Wei, Yuanxin
    Ye, Shengyuan
    Jiang, Jiazhi
    Chen, Xu
    Huang, Dan
    Du, Jiangsu
    Lu, Yutong
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,