MDP: Model Decomposition and Parallelization of Vision Transformer for Distributed Edge Inference

被引:0
|
作者
Wang, Weiyan [1 ]
Zhang, Yiming [2 ]
Jin, Yilun [1 ]
Tian, Han [1 ]
Chen, Li [3 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[2] Xiamen Univ, Xiamen, Peoples R China
[3] Zhongguancun Lab, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
Distributed edge inference; vision transformers; boosting ensemble;
D O I
10.1109/MSN60784.2023.00086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed edge inference emerges to be a promising paradigm to speed up inference. Previous works make physical partitions on CNNs to realize it, but there are the following challenges for vision transformers: (1) high communication costs for the large model; (2) stragglers because of heterogeneous devices; (3) time-out exceptions due to unstable edge devices. Therefore, we propose a novel Model Decomposition and Parallelization(MDP) for large vision transformers. Inspired by the implicit boosting ensemble in the vision transformer, MDP decomposes it into an explicit boosting ensemble of different and parallel sub-models. It sequentially trains all sub-models to gradually reduce the residual errors. To minimize dependency and communication among sub-models, We adopt stacking distillation to bring every sub-model extra information about others for better error correction. Different sub-models can take both different image sizes and model sizes to run on heterogeneous devices and improve the ensemble diversities. To handle the time-out exception, we add vanilla supervised learning on every submodel for the bagging ensemble in case of the early termination of boosting ensemble. As a result, all sub-models can not only run in parallel without much communication but also can be adapted to the heterogeneous devices, while maintaining accuracy even with time-out exceptions. Experiments show that MDP can outperform other baselines by 5.2x similar to 2.1x in latency and 5.1x similar to 1.7x in throughput with comparable accuracy.
引用
收藏
页码:570 / 578
页数:9
相关论文
共 50 条
  • [21] Vision transformer models for mobile/edge devices: a survey
    Lee, Seung Il
    Koo, Kwanghyun
    Lee, Jong Ho
    Lee, Gilha
    Jeong, Sangbeom
    Seongjun, O.
    Kim, Hyun
    MULTIMEDIA SYSTEMS, 2024, 30 (02)
  • [22] Diffusely Distributed Parallelization of MOEA/D with Edge Weight Vectors Sharing
    Sato, Yuji
    Midtlyng, Mads
    Sato, Mikiko
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 411 - 414
  • [23] FedViT: Federated continual learning of vision transformer at edge
    Zuo, Xiaojiang
    Luopan, Yaxin
    Han, Rui
    Zhang, Qinglong
    Liu, Chi Harold
    Wang, Guoren
    Chen, Lydia Y.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 154 : 1 - 15
  • [24] Comparing Domain Decomposition Methods for the Parallelization of Distributed Land Surface Models
    von Ramm, Alexander
    Weismueller, Jens
    Kurtz, Wolfgang
    Neckel, Tobias
    COMPUTATIONAL SCIENCE - ICCS 2019, PT I, 2019, 11536 : 197 - 210
  • [25] LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
    Graham, Ben
    El-Nouby, Alaaeldin
    Touvron, Hugo
    Stock, Pierre
    Joulin, Armand
    Jegou, Herve
    Douze, Matthijs
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12239 - 12249
  • [26] Failure-Resilient Distributed Inference With Model Compression Over Heterogeneous Edge Devices
    Wang, Li
    Li, Liang
    Xu, Lianming
    Peng, Xian
    Fei, Aiguo
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 12680 - 12692
  • [27] Low Latency Deep Learning Inference Model for Distributed Intelligent IoT Edge Clusters
    Naveen, Soumyalatha
    Kounte, Manjunath R.
    Ahmed, Mohammed Riyaz
    IEEE ACCESS, 2021, 9 : 160607 - 160621
  • [28] Real-time Transformer Inference on Edge AI Accelerators
    Reidy, Brendan
    Mohammadi, Mohammadreza
    Elbtity, Mohammed
    Smith, Heath
    Zand, Ramtin
    2023 IEEE 29TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM, RTAS, 2023, : 341 - 344
  • [29] A COMPILER FOR A DISTRIBUTED INFERENCE MODEL
    PERCEBOIS, C
    SIGNES, N
    AGNOLETTO, P
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 487 : 412 - 421
  • [30] Traffic Control Model and Algorithm Based on Decomposition of MDP
    Yin, Biao
    Dridi, Mahjoub
    El Moudni, Abdellah
    2014 INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT), 2014, : 225 - 230