AI Computing Systems for Large Language Models Training

被引:0
|
作者
Zhang, Zhen-Xing [1 ,2 ]
Wen, Yuan-Bo [2 ]
Lyu, Han-Qi [1 ,2 ,3 ]
Liu, Chang [3 ]
Zhang, Rui [2 ]
Li, Xia-Qing [2 ]
Wang, Chao [1 ]
Du, Zi-Dong [2 ,4 ]
Guo, Qi [2 ]
Li, Ling [5 ]
Zhou, Xue-Hai [1 ]
Chen, Yun-Ji [2 ,6 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230026, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing 100190, Peoples R China
[3] Cambricon Technol, Beijing 100191, Peoples R China
[4] Shanghai Innovat Ctr Processor Technol, Shanghai 201210, Peoples R China
[5] Chinese Acad Sci, Inst Software, Intelligent Software Res Ctr, Beijing 100190, Peoples R China
[6] Univ Chinese Acad Sci, Beijing 101408, Peoples R China
基金
中国国家自然科学基金;
关键词
artificial intelligence (AI) chip; large language model (LLM); AI computing system; accelerator; EFFICIENT;
D O I
10.1007/s11390-024-4178-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a comprehensive overview of artificial intelligence (AI) computing systems for large language models (LLMs) training. The rapid advancement of LLMs in recent years, coupled with the widespread adoption of algorithms and applications such as BERT, ChatGPT, and DeepSeek, has sparked significant interest in this field. We classify LLMs into encoder-only, encoder-decoder, and decoder-only models, and briefly analyze their training and inference processes to emphasize their substantial need for computational resources. These operations depend heavily on AI-specific accelerators like GPUs (graphics processing units), TPUs (tensor processing units), and MLUs (machine learning units). However, as the gap widens between the increasing complexity of LLMs and the current capabilities of accelerators, it becomes essential to adopt heterogeneous computing systems optimized for distributed environments to manage the growing computational and memory requirements of LLMs. We delve into the execution and scheduling of LLM algorithms, underlining the critical role of distributed computing strategies, memory management enhancements, and boosting computational efficiency. This paper clarifies the complex relationship between algorithm design, hardware infrastructure, and software optimization, and provides an in-depth understanding of both the software and hardware infrastructure supporting LLMs training, offering insights into the challenges and potential avenues for future development and deployment.
引用
收藏
页码:6 / 41
页数:36
相关论文
共 50 条
  • [1] Generative AI, Large Language Models, and ChatGPT in Construction Education, Training, and Practice
    Jelodar, Mostafa Babaeian
    BUILDINGS, 2025, 15 (06)
  • [2] Foundation Models, Generative AI, and Large Language Models
    Ross, Angela
    McGrow, Kathleen
    Zhi, Degui
    Rasmy, Laila
    CIN-COMPUTERS INFORMATICS NURSING, 2024, 42 (05) : 377 - 387
  • [3] Large Language Models Need Symbolic AI
    Hammond, Kristian
    Leake, David
    NEURAL-SYMBOLIC LEARNING AND REASONING 2023, NESY 2023, 2023,
  • [4] Large Language Models and Generative AI, Oh My!
    Zyda, Michael
    COMPUTER, 2024, 57 (03) : 127 - 132
  • [5] Neurosymbolic AI Approach to Attribution in Large Language Models
    Tilwani, Deepa
    Venkataramanan, Revathy
    Sheth, Amit P.
    IEEE INTELLIGENT SYSTEMS, 2024, 39 (06) : 10 - 17
  • [6] The promise of AI Large Language Models for Epilepsy care
    Landais, Raphaelle
    Sultan, Mustafa
    Thomas, Rhys H.
    EPILEPSY & BEHAVIOR, 2024, 154
  • [7] LAraBench: Benchmarking Arabic AI with Large Language Models
    Qatar Computing Research Institute, HBKU, Qatar
    不详
    arXiv, 1600,
  • [8] Death by AI: Will large language models diminish Wikipedia?
    Wagner, Christian
    Jiang, Ling
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2025,
  • [9] Large language models make AI usable for everyone!
    Bause, Fabian
    Konstruktion, 2024, 76 (04): : 3 - 5
  • [10] Large Language Models and Generative AI, Oh My!
    Cobb, Peter J.
    ADVANCES IN ARCHAEOLOGICAL PRACTICE, 2023, 11 (03): : 363 - 369