LocMoE: A Low-Overhead MoE for Large Language Model Training

被引:0
|
作者
Li, Jing [1 ]
Sun, Zhijie [1 ]
He, Xuan [1 ]
Zeng, Li [1 ]
Lin, Yi [1 ]
Li, Entong [1 ]
Zheng, Binfan [1 ]
Zhao, Rongqian [1 ]
Chen, Xin [1 ]
机构
[1] Huawei Technol Co Ltd, Shenzhen, Guangdong, Peoples R China
来源
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024 | 2024年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Mixtures-of-Experts (MoE) model is a widespread distributed and integrated learning method for large language models (LLM), which is favored due to its ability to sparsify and expand models efficiently. However, the performance of MoE is limited by load imbalance and high latency of All-to-All communication, along with relatively redundant computation owing to large expert capacity. Load imbalance may result from existing routing policies that consistently tend to select certain experts. The frequent inter-node communication in the All-to-All procedure also significantly prolongs the training time. To alleviate the above performance problems, we propose a novel routing strategy that combines load balance and locality by converting partial inter-node communication to that of intra-node. Notably, we elucidate that there is a minimum threshold for expert capacity, calculated through the maximal angular deviation between the gating weights of the experts and the assigned tokens. We port these modifications on the PanGuS model based on the MindSpore framework with multi-level routing and conduct experiments on Ascend clusters. The experiment results demonstrate that the proposed LocMoE reduces training time per epoch by 12.68% to 22.24% compared to classical routers, such as hash router and switch router, without impacting the model accuracy.
引用
收藏
页码:6377 / 6387
页数:11
相关论文
共 50 条
  • [1] TruffleReloader: A Low-Overhead Language-Neutral Reloader
    Pool, Tonis
    Gregersen, Allan Raundahl
    Vojdani, Vesal
    PROCEEDINGS OF THE 11TH WORKSHOP ON IMPLEMENTATION, COMPILATION, OPTIMIZATION OF OBJECT-ORIENTED LANGUAGES, PROGRAMS AND SYSTEMS (ICOOOLPS'16), 2016,
  • [2] Low-Overhead Beam Training Scheme for Extremely Large-Scale RIS in Near Field
    Liu, Wang
    Pan, Cunhua
    Ren, Hong
    Shu, Feng
    Jin, Shi
    Wang, Jiangzhou
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2023, 71 (08) : 4924 - 4940
  • [3] Low-Overhead Deadlock Prediction
    Cai, Yan
    Meng, Ruijie
    Palsberg, Jens
    2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 1298 - 1309
  • [4] Low-Overhead Paxos Replication
    Guo J.
    Chu J.
    Cai P.
    Zhou M.
    Zhou A.
    Data Science and Engineering, 2017, 2 (2) : 169 - 177
  • [5] Low-Overhead WiFi Fingerprinting
    Jun, Junghyun
    He, Liang
    Gu, Yu
    Jiang, Wenchao
    Kushwaha, Gaurav
    A, Vipin
    Cheng, Long
    Liu, Cong
    Zhu, Ting
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2018, 17 (03) : 590 - 603
  • [6] TripleID: A Low-Overhead Representation and Querying Using GPU for Large RDFs
    Chantrapornchai, Chantana
    Choksuchat, Chidchanok
    Haidl, Michael
    Gorlatch, Sergei
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES, BDAS 2016, 2016, 613 : 400 - 415
  • [7] Complement Sparsification: Low-Overhead Model Pruning for Federated Learning
    Jiang, Xiaopeng
    Borcea, Cristian
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8087 - 8095
  • [8] A Low-overhead Cooperative Failure Detector
    Liu, Jiaxi
    Dong, Jian
    Wu, Zhibo
    Wu, Jin
    Lan, Jinghui
    Yu, Jiaxin
    2015 FIFTH INTERNATIONAL CONFERENCE ON INSTRUMENTATION AND MEASUREMENT, COMPUTER, COMMUNICATION AND CONTROL (IMCCC), 2015, : 811 - 815
  • [9] Low-Overhead Architecture for Security Tag
    Shioya, Ryota
    Kim, Daewung
    Horio, Kazuo
    Goshima, Masahiro
    Sakai, Shuichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (01): : 69 - 78
  • [10] Low-overhead scheduling of nested parallelism
    Hummel, S.F.
    Schonberg, E.
    1600, (35): : 5 - 6