AI Computing Systems for Large Language Models Training

被引:0
|
作者
Zhang, Zhen-Xing [1 ,2 ]
Wen, Yuan-Bo [2 ]
Lyu, Han-Qi [1 ,2 ,3 ]
Liu, Chang [3 ]
Zhang, Rui [2 ]
Li, Xia-Qing [2 ]
Wang, Chao [1 ]
Du, Zi-Dong [2 ,4 ]
Guo, Qi [2 ]
Li, Ling [5 ]
Zhou, Xue-Hai [1 ]
Chen, Yun-Ji [2 ,6 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230026, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing 100190, Peoples R China
[3] Cambricon Technol, Beijing 100191, Peoples R China
[4] Shanghai Innovat Ctr Processor Technol, Shanghai 201210, Peoples R China
[5] Chinese Acad Sci, Inst Software, Intelligent Software Res Ctr, Beijing 100190, Peoples R China
[6] Univ Chinese Acad Sci, Beijing 101408, Peoples R China
基金
中国国家自然科学基金;
关键词
artificial intelligence (AI) chip; large language model (LLM); AI computing system; accelerator; EFFICIENT;
D O I
10.1007/s11390-024-4178-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a comprehensive overview of artificial intelligence (AI) computing systems for large language models (LLMs) training. The rapid advancement of LLMs in recent years, coupled with the widespread adoption of algorithms and applications such as BERT, ChatGPT, and DeepSeek, has sparked significant interest in this field. We classify LLMs into encoder-only, encoder-decoder, and decoder-only models, and briefly analyze their training and inference processes to emphasize their substantial need for computational resources. These operations depend heavily on AI-specific accelerators like GPUs (graphics processing units), TPUs (tensor processing units), and MLUs (machine learning units). However, as the gap widens between the increasing complexity of LLMs and the current capabilities of accelerators, it becomes essential to adopt heterogeneous computing systems optimized for distributed environments to manage the growing computational and memory requirements of LLMs. We delve into the execution and scheduling of LLM algorithms, underlining the critical role of distributed computing strategies, memory management enhancements, and boosting computational efficiency. This paper clarifies the complex relationship between algorithm design, hardware infrastructure, and software optimization, and provides an in-depth understanding of both the software and hardware infrastructure supporting LLMs training, offering insights into the challenges and potential avenues for future development and deployment.
引用
收藏
页码:6 / 41
页数:36
相关论文
共 50 条
  • [21] Extracting Training Data from Large Language Models
    Carlini, Nicholas
    Tramer, Florian
    Wallace, Eric
    Jagielski, Matthew
    Herbert-Voss, Ariel
    Lee, Katherine
    Roberts, Adam
    Brown, Tom
    Song, Dawn
    Erlingsson, Ulfar
    Oprea, Alina
    Raffel, Colin
    PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, 2021, : 2633 - 2650
  • [22] LARGE MARGIN TRAINING IMPROVES LANGUAGE MODELS FOR ASR
    Wang, Jilin
    Huang, Jiaji
    Church, Kenneth Ward
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7368 - 7372
  • [23] Training Compute-Optimal Large Language Models
    Hoffmann, Jordan
    Borgeaud, Sebastian
    Mensch, Arthur
    Buchatskaya, Elena
    Cai, Trevor
    Rutherford, Eliza
    Casas, Diego de las
    Hendricks, Lisa Anne
    Welbl, Johannes
    Clark, Aidan
    Hennigan, Tom
    Noland, Eric
    Millican, Katie
    van den Driessche, George
    Damoc, Bogdan
    Guy, Aurelia
    Osindero, Simon
    Simonyan, Karen
    Elsen, Erich
    Vinyals, Oriol
    Rae, Jack W.
    Sifre, Laurent
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [24] Strategies for Training Large Vocabulary Neural Language Models
    Chen, Wenlin
    Grangier, David
    Auli, Michael
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1975 - 1985
  • [25] Targeted training for numerical reasoning with large language models
    Li, Xiao
    Liu, Sichen
    Zhu, Yin
    Cheng, Gong
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (01) : 197 - 221
  • [26] Emergent Structures and Training Dynamics in Large Language Models
    Teehan, Ryan
    Clinciu, Miruna
    Serikov, Oleg
    Szczechla, Eliza
    Seelam, Natasha
    Mirkin, Shachar
    Gokaslan, Aaron
    PROCEEDINGS OF WORKSHOP ON CHALLENGES & PERSPECTIVES IN CREATING LARGE LANGUAGE MODELS (BIGSCIENCE EPISODE #5), 2022, : 146 - 159
  • [27] Sparsity-Accelerated Training for Large Language Models
    Ma, Da
    Chen, Lu
    Wang, Pengyu
    Xu, Hongshen
    Li, Hanqi
    Sun, Liangtai
    Zhu, Su
    Fan, Shuai
    Yu, Kai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14696 - 14707
  • [29] Ethics, Governance, and User Mental Models for Large Language Models in Computing Education
    Zhou, Kyrie Zhixuan
    Kilhoffer, Zachary
    Sanfilippo, Madelyn Rose
    Underwood, Ted
    Gumusel, Ece
    Wei, Mengyi
    Choudhry, Abhinav
    Xiong, Jinjun
    XRDS: Crossroads, 2024, 31 (01): : 46 - 51
  • [30] Improving Recommender Systems with Large Language Models
    Lubos, Sebastian
    ADJUNCT PROCEEDINGS OF THE 32ND ACM CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION, UMAP 2024, 2024, : 40 - 44