LocMoE: A Low-Overhead MoE for Large Language Model Training

被引:0
|
作者
Li, Jing [1 ]
Sun, Zhijie [1 ]
He, Xuan [1 ]
Zeng, Li [1 ]
Lin, Yi [1 ]
Li, Entong [1 ]
Zheng, Binfan [1 ]
Zhao, Rongqian [1 ]
Chen, Xin [1 ]
机构
[1] Huawei Technol Co Ltd, Shenzhen, Guangdong, Peoples R China
来源
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024 | 2024年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Mixtures-of-Experts (MoE) model is a widespread distributed and integrated learning method for large language models (LLM), which is favored due to its ability to sparsify and expand models efficiently. However, the performance of MoE is limited by load imbalance and high latency of All-to-All communication, along with relatively redundant computation owing to large expert capacity. Load imbalance may result from existing routing policies that consistently tend to select certain experts. The frequent inter-node communication in the All-to-All procedure also significantly prolongs the training time. To alleviate the above performance problems, we propose a novel routing strategy that combines load balance and locality by converting partial inter-node communication to that of intra-node. Notably, we elucidate that there is a minimum threshold for expert capacity, calculated through the maximal angular deviation between the gating weights of the experts and the assigned tokens. We port these modifications on the PanGuS model based on the MindSpore framework with multi-level routing and conduct experiments on Ascend clusters. The experiment results demonstrate that the proposed LocMoE reduces training time per epoch by 12.68% to 22.24% compared to classical routers, such as hash router and switch router, without impacting the model accuracy.
引用
收藏
页码:6377 / 6387
页数:11
相关论文
共 50 条
  • [11] Low-overhead architecture for security tag
    Shioya, Ryota
    Kim, Daewung
    Horio, Kazuo
    Goshima, Masahiro
    Sakai, Shuichi
    IEEE 15TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2009, : 135 - 142
  • [12] EFD: An Efficient Low-Overhead Scheduler
    Chen, Jinbang
    Heusse, Martin
    Urvoy-Keller, Guillaume
    NETWORKING 2011, PT II, 2011, 6641 : 150 - 163
  • [13] On the Generation of Binary functions with Low-Overhead
    Voyiatzis, I.
    Efstathiou, C.
    2017 12TH IEEE INTERNATIONAL CONFERENCE ON DESIGN & TECHNOLOGY OF INTEGRATED SYSTEMS IN NANOSCALE ERA (DTIS 2017), 2017,
  • [14] Low-Overhead Accrual Failure Detector
    Ren, Xiao
    Dong, Jian
    Liu, Hongwei
    Li, Yang
    Yang, Xiaozong
    SENSORS, 2012, 12 (05): : 5815 - 5823
  • [15] Low-Overhead Vlrtualization of Mobile Platforms
    Heiser, Gernot
    PROCEEDINGS OF THE PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURES AND SYNTHESIS FOR EMBEDDED SYSTEMS (CASES '11), 2011, : 3 - 3
  • [16] LOW-OVERHEAD SCHEDULING OF NESTED PARALLELISM
    HUMMEL, SF
    SCHONBERG, E
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1991, 35 (5-6) : 743 - 765
  • [17] LoGV: Low-overhead GPGPU Virtualization
    Gottschlag, Mathias
    Hillenbrand, Marius
    Kehne, Jens
    Stoess, Jan
    Bellosa, Frank
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1721 - 1726
  • [18] Low-overhead distribution strategy for simulation and optimization of large-area metasurfaces
    Skarda, Jinhie
    Trivedi, Rahul
    Su, Logan
    Ahmad-Stein, Diego
    Kwon, Hyounghan
    Han, Seunghoon
    Fan, Shanhui
    Vuckovic, Jelena
    NPJ COMPUTATIONAL MATERIALS, 2022, 8 (01)
  • [19] LOFFS: A Low-Overhead File System for Large Flash Memory on Embedded Devices
    Zhang, Runyu
    Liu, Duo
    Chen, Xianzhang
    She, Xiongxiong
    Yang, Chaoshu
    Tan, Yujuan
    Shen, Zhaoyan
    Sho, Zili
    PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
  • [20] Towards Providing Low-Overhead Data Race Detection for Large OpenMP Applications
    Protze, Joachim
    Atzeni, Simone
    Ahn, Dong H.
    Schulz, Martin
    Gopalakrishnan, Ganesh
    Mueller, Matthias S.
    Laguna, Ignacio
    Rakamaric, Zvonimir
    Lee, Greg L.
    PROCEEDINGS OF LLVM-HPC 14 2014 LLVM COMPILER INFRASTRUCTURE IN HPC, 2014, : 40 - 47