LocMoE: A Low-Overhead MoE for Large Language Model Training

被引：0

作者：

Li, Jing ^{[1
]}

Sun, Zhijie ^{[1
]}

He, Xuan ^{[1
]}

Zeng, Li ^{[1
]}

Lin, Yi ^{[1
]}

Li, Entong ^{[1
]}

Zheng, Binfan ^{[1
]}

Zhao, Rongqian ^{[1
]}

Chen, Xin ^{[1
]}

机构：

[1] Huawei Technol Co Ltd, Shenzhen, Guangdong, Peoples R China

来源：

PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Mixtures-of-Experts (MoE) model is a widespread distributed and integrated learning method for large language models (LLM), which is favored due to its ability to sparsify and expand models efficiently. However, the performance of MoE is limited by load imbalance and high latency of All-to-All communication, along with relatively redundant computation owing to large expert capacity. Load imbalance may result from existing routing policies that consistently tend to select certain experts. The frequent inter-node communication in the All-to-All procedure also significantly prolongs the training time. To alleviate the above performance problems, we propose a novel routing strategy that combines load balance and locality by converting partial inter-node communication to that of intra-node. Notably, we elucidate that there is a minimum threshold for expert capacity, calculated through the maximal angular deviation between the gating weights of the experts and the assigned tokens. We port these modifications on the PanGuS model based on the MindSpore framework with multi-level routing and conduct experiments on Ascend clusters. The experiment results demonstrate that the proposed LocMoE reduces training time per epoch by 12.68% to 22.24% compared to classical routers, such as hash router and switch router, without impacting the model accuracy.

引用

页码：6377 / 6387

页数：11

共 50 条

[11] Low-overhead architecture for security tag
Shioya, Ryota
Kim, Daewung
Horio, Kazuo
Goshima, Masahiro
Sakai, Shuichi
IEEE 15TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2009, : 135 - 142
[12] EFD: An Efficient Low-Overhead Scheduler
Chen, Jinbang
Heusse, Martin
Urvoy-Keller, Guillaume
NETWORKING 2011, PT II, 2011, 6641 : 150 - 163
[13] On the Generation of Binary functions with Low-Overhead
Voyiatzis, I.
Efstathiou, C.
2017 12TH IEEE INTERNATIONAL CONFERENCE ON DESIGN & TECHNOLOGY OF INTEGRATED SYSTEMS IN NANOSCALE ERA (DTIS 2017), 2017,
[14] Low-Overhead Accrual Failure Detector
Ren, Xiao
Dong, Jian
Liu, Hongwei
Li, Yang
Yang, Xiaozong
SENSORS, 2012, 12 (05): : 5815 - 5823
[15] Low-Overhead Vlrtualization of Mobile Platforms
Heiser, Gernot
PROCEEDINGS OF THE PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURES AND SYNTHESIS FOR EMBEDDED SYSTEMS (CASES '11), 2011, : 3 - 3
[16] LOW-OVERHEAD SCHEDULING OF NESTED PARALLELISM
HUMMEL, SF
SCHONBERG, E
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1991, 35 (5-6) : 743 - 765
[17] LoGV: Low-overhead GPGPU Virtualization
Gottschlag, Mathias
Hillenbrand, Marius
Kehne, Jens
Stoess, Jan
Bellosa, Frank
2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1721 - 1726
[18] Low-overhead distribution strategy for simulation and optimization of large-area metasurfaces
Skarda, Jinhie
Trivedi, Rahul
Su, Logan
Ahmad-Stein, Diego
Kwon, Hyounghan
Han, Seunghoon
Fan, Shanhui
Vuckovic, Jelena
NPJ COMPUTATIONAL MATERIALS, 2022, 8 (01)
[19] LOFFS: A Low-Overhead File System for Large Flash Memory on Embedded Devices
Zhang, Runyu
Liu, Duo
Chen, Xianzhang
She, Xiongxiong
Yang, Chaoshu
Tan, Yujuan
Shen, Zhaoyan
Sho, Zili
PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
[20] Towards Providing Low-Overhead Data Race Detection for Large OpenMP Applications
Protze, Joachim
Atzeni, Simone
Ahn, Dong H.
Schulz, Martin
Gopalakrishnan, Ganesh
Mueller, Matthias S.
Laguna, Ignacio
Rakamaric, Zvonimir
Lee, Greg L.
PROCEEDINGS OF LLVM-HPC 14 2014 LLVM COMPILER INFRASTRUCTURE IN HPC, 2014, : 40 - 47

← 1 2 3 4 5 →