MDP: Model Decomposition and Parallelization of Vision Transformer for Distributed Edge Inference

被引:0
|
作者
Wang, Weiyan [1 ]
Zhang, Yiming [2 ]
Jin, Yilun [1 ]
Tian, Han [1 ]
Chen, Li [3 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[2] Xiamen Univ, Xiamen, Peoples R China
[3] Zhongguancun Lab, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
Distributed edge inference; vision transformers; boosting ensemble;
D O I
10.1109/MSN60784.2023.00086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed edge inference emerges to be a promising paradigm to speed up inference. Previous works make physical partitions on CNNs to realize it, but there are the following challenges for vision transformers: (1) high communication costs for the large model; (2) stragglers because of heterogeneous devices; (3) time-out exceptions due to unstable edge devices. Therefore, we propose a novel Model Decomposition and Parallelization(MDP) for large vision transformers. Inspired by the implicit boosting ensemble in the vision transformer, MDP decomposes it into an explicit boosting ensemble of different and parallel sub-models. It sequentially trains all sub-models to gradually reduce the residual errors. To minimize dependency and communication among sub-models, We adopt stacking distillation to bring every sub-model extra information about others for better error correction. Different sub-models can take both different image sizes and model sizes to run on heterogeneous devices and improve the ensemble diversities. To handle the time-out exception, we add vanilla supervised learning on every submodel for the bagging ensemble in case of the early termination of boosting ensemble. As a result, all sub-models can not only run in parallel without much communication but also can be adapted to the heterogeneous devices, while maintaining accuracy even with time-out exceptions. Experiments show that MDP can outperform other baselines by 5.2x similar to 2.1x in latency and 5.1x similar to 1.7x in throughput with comparable accuracy.
引用
收藏
页码:570 / 578
页数:9
相关论文
共 50 条
  • [31] Automated Exploration and Implementation of Distributed CNN Inference at the Edge
    Guo, Xiaotian
    Pimentel, Andy D. D.
    Stefanov, Todor
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (07) : 5843 - 5858
  • [32] Distributed Edge Inference: an Experimental Study on Multiview Detection
    Mittone, Gianluca
    Malenza, Giulio
    Aldinucci, Marco
    Birke, Robert
    16TH IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING, UCC 2023, 2023,
  • [33] Hierarchical and Distributed Machine Learning Inference Beyond the Edge
    Thomas, Anthony
    Guo, Yunhui
    Kim, Yeseong
    Aksanli, Baris
    Kumar, Arun
    Rosing, Tajana S.
    PROCEEDINGS OF THE 2019 IEEE 16TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC 2019), 2019, : 18 - 23
  • [34] Distributed Assignment With Load Balancing for DNN Inference at the Edge
    Xu, Yuzhe
    Mohammed, Thaha
    Di Francesco, Mario
    Fischione, Carlo
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (02): : 1053 - 1065
  • [35] On Model Parallelization and Scheduling Strategies for Distributed Machine Learning
    Lee, Seunghak
    Kim, Jin Kyu
    Zheng, Xun
    Ho, Qirong
    Gibson, Garth A.
    Xing, Eric P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [36] Vision Transformer-based overlay processor for Edge Computing
    Liu, Fang
    Fan, Zimeng
    Hu, Wei
    Xu, Dian
    Peng, Min
    He, Jing
    He, Yanxiang
    APPLIED SOFT COMPUTING, 2024, 156
  • [37] Distributed DNN Inference With Fine-Grained Model Partitioning in Mobile Edge Computing Networks
    Li, Hui
    Li, Xiuhua
    Fan, Qilin
    He, Qiang
    Wang, Xiaofei
    Leung, Victor C. M.
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (10) : 9060 - 9074
  • [38] DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices
    Xu, Guanyu
    Hao, Zhiwei
    Luo, Yong
    Hu, Han
    An, Jianping
    Mao, Shiwen
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (05) : 5917 - 5932
  • [39] Semantically Distributed Robust Optimization for Vision-and-Language Inference
    Gokhale, Tejas
    Chaudhary, Abhishek
    Banerjee, Pratyay
    Baral, Chitta
    Yang, Yezhou
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1493 - 1513
  • [40] Massive Parallelization of Serial Inference Algorithms for a Complex Generalized Linear Model
    Suchard, Marc A.
    Simpson, Shawn E.
    Zorych, Ivan
    Ryan, Patrick
    Madigan, David
    ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2013, 23 (01):