SDPIPE: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training

被引:5
|
作者
Miao, Xupeng [1 ]
Shi, Yining [2 ]
Yang, Zhi [2 ]
Cui, Bin [2 ]
Jia, Zhihao [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Peking Univ, Beijing, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2023年 / 16卷 / 09期
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
ALGORITHMS;
D O I
10.14778/3598581.3598604
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The increasing size of both deep learning models and training data necessitates the ability to scale out model training through pipeline-parallel training, which combines pipelined model parallelism and data parallelism. However, most of them assume an ideal homogeneous dedicated cluster. As for real cloud clusters, these approaches su.er from the intensive model synchronization overheads due to the dynamic environment heterogeneity. Such a huge challenge leaves the design in a dilemma: either the performance bottleneck of the central parameter server (PS) or severe performance degradation caused by stragglers for decentralized synchronization (like All-Reduce). This approach presents SDPIPE, a new semi-decentralized framework to get the best of both worlds, achieving both high heterogeneity tolerance and convergence e.ciency in pipeline-parallel training. To provide high performance, we decentralize the communication model synchronization, which accounts for the largest proportion of synchronization overhead. In contrast, we centralize the process of group scheduling, which is lightweight but needs a global view for better performance and convergence speed against heterogeneity. We show via a prototype implementation the signi.cant advantage of SDP... on performance and scalability, facing di.erent environments.
引用
收藏
页码:2354 / 2363
页数:10
相关论文
共 35 条
  • [31] S-Edge: heterogeneity-aware, light-weighted, and edge computing integrated adaptive traffic light control framework
    Anuj Sachan
    Neetesh Kumar
    The Journal of Supercomputing, 2023, 79 : 14923 - 14953
  • [32] S-Edge: heterogeneity-aware, light-weighted, and edge computing integrated adaptive traffic light control framework
    Sachan, Anuj
    Kumar, Neetesh
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (13): : 14923 - 14953
  • [33] Co-Training-Teaching: A Robust Semi-Supervised Framework for Review-Aware Rating Regression
    Lu, Xiangkui
    Wu, Jun
    Huang, Junheng
    Luo, Fangyuan
    Yuan, Jianbo
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (02)
  • [34] Multi-head co-training: An uncertainty-aware and robust semi-supervised learning framework
    Chen, Mingcai
    Wang, Chongjun
    KNOWLEDGE-BASED SYSTEMS, 2024, 302
  • [35] Heter-Train: A Distributed Training Framework Based on Semi-Asynchronous Parallel Mechanism for Heterogeneous Intelligent Transportation Systems
    Geng, Jiawei
    Cao, Jing
    Jia, Haipeng
    Zhu, Zongwei
    Fang, Hai
    Gao, Chengxi
    Ji, Cheng
    Jia, Gangyong
    Han, Guangjie
    Zhou, Xuehai
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (01) : 959 - 972