AI Computing Systems for Large Language Models Training

被引：0

作者：

Zhang, Zhen-Xing ^{[1
,2
]}

Wen, Yuan-Bo ^{[2
]}

Lyu, Han-Qi ^{[1
,2
,3
]}

Liu, Chang ^{[3
]}

Zhang, Rui ^{[2
]}

Li, Xia-Qing ^{[2
]}

Wang, Chao ^{[1
]}

Du, Zi-Dong ^{[2
,4
]}

Guo, Qi ^{[2
]}

Li, Ling ^{[5
]}

Zhou, Xue-Hai ^{[1
]}

Chen, Yun-Ji ^{[2
,6
]}

机构：

[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230026, Peoples R China

[2] Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing 100190, Peoples R China

[3] Cambricon Technol, Beijing 100191, Peoples R China

[4] Shanghai Innovat Ctr Processor Technol, Shanghai 201210, Peoples R China

[5] Chinese Acad Sci, Inst Software, Intelligent Software Res Ctr, Beijing 100190, Peoples R China

[6] Univ Chinese Acad Sci, Beijing 101408, Peoples R China

来源：

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY | 2025年 / 40卷 / 01期

基金：

中国国家自然科学基金;

关键词：

artificial intelligence (AI) chip; large language model (LLM); AI computing system; accelerator; EFFICIENT;

D O I：

10.1007/s11390-024-4178-1

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we present a comprehensive overview of artificial intelligence (AI) computing systems for large language models (LLMs) training. The rapid advancement of LLMs in recent years, coupled with the widespread adoption of algorithms and applications such as BERT, ChatGPT, and DeepSeek, has sparked significant interest in this field. We classify LLMs into encoder-only, encoder-decoder, and decoder-only models, and briefly analyze their training and inference processes to emphasize their substantial need for computational resources. These operations depend heavily on AI-specific accelerators like GPUs (graphics processing units), TPUs (tensor processing units), and MLUs (machine learning units). However, as the gap widens between the increasing complexity of LLMs and the current capabilities of accelerators, it becomes essential to adopt heterogeneous computing systems optimized for distributed environments to manage the growing computational and memory requirements of LLMs. We delve into the execution and scheduling of LLM algorithms, underlining the critical role of distributed computing strategies, memory management enhancements, and boosting computational efficiency. This paper clarifies the complex relationship between algorithm design, hardware infrastructure, and software optimization, and provides an in-depth understanding of both the software and hardware infrastructure supporting LLMs training, offering insights into the challenges and potential avenues for future development and deployment.

引用

页码：6 / 41

页数：36

共 50 条

[21] Extracting Training Data from Large Language Models
Carlini, Nicholas
Tramer, Florian
Wallace, Eric
Jagielski, Matthew
Herbert-Voss, Ariel
Lee, Katherine
Roberts, Adam
Brown, Tom
Song, Dawn
Erlingsson, Ulfar
Oprea, Alina
Raffel, Colin
PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, 2021, : 2633 - 2650
[22] LARGE MARGIN TRAINING IMPROVES LANGUAGE MODELS FOR ASR
Wang, Jilin
Huang, Jiaji
Church, Kenneth Ward
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7368 - 7372
[23] Training Compute-Optimal Large Language Models
Hoffmann, Jordan
Borgeaud, Sebastian
Mensch, Arthur
Buchatskaya, Elena
Cai, Trevor
Rutherford, Eliza
Casas, Diego de las
Hendricks, Lisa Anne
Welbl, Johannes
Clark, Aidan
Hennigan, Tom
Noland, Eric
Millican, Katie
van den Driessche, George
Damoc, Bogdan
Guy, Aurelia
Osindero, Simon
Simonyan, Karen
Elsen, Erich
Vinyals, Oriol
Rae, Jack W.
Sifre, Laurent
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[24] Strategies for Training Large Vocabulary Neural Language Models
Chen, Wenlin
Grangier, David
Auli, Michael
PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1975 - 1985
[25] Targeted training for numerical reasoning with large language models
Li, Xiao
Liu, Sichen
Zhu, Yin
Cheng, Gong
KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (01) : 197 - 221
[26] Emergent Structures and Training Dynamics in Large Language Models
Teehan, Ryan
Clinciu, Miruna
Serikov, Oleg
Szczechla, Eliza
Seelam, Natasha
Mirkin, Shachar
Gokaslan, Aaron
PROCEEDINGS OF WORKSHOP ON CHALLENGES & PERSPECTIVES IN CREATING LARGE LANGUAGE MODELS (BIGSCIENCE EPISODE #5), 2022, : 146 - 159
[27] Sparsity-Accelerated Training for Large Language Models
Ma, Da
Chen, Lu
Wang, Pengyu
Xu, Hongshen
Li, Hanqi
Sun, Liangtai
Zhu, Su
Fan, Shuai
Yu, Kai
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14696 - 14707
[28] How well do large language models know dentistry? AI takes the testHow well do large language models know dentistry? AI takes the test
Maaz Anwer Memon
British Dental Journal, 2025, 238 (1) : 33 - 33
[29] Ethics, Governance, and User Mental Models for Large Language Models in Computing Education
Zhou, Kyrie Zhixuan
Kilhoffer, Zachary
Sanfilippo, Madelyn Rose
Underwood, Ted
Gumusel, Ece
Wei, Mengyi
Choudhry, Abhinav
Xiong, Jinjun
XRDS: Crossroads, 2024, 31 (01): : 46 - 51
[30] Improving Recommender Systems with Large Language Models
Lubos, Sebastian
ADJUNCT PROCEEDINGS OF THE 32ND ACM CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION, UMAP 2024, 2024, : 40 - 44

← 1 2 3 4 5 →