WActiGrad: Structured Pruning for Efficient Finetuning and Inference of Large Language Models on AI Accelerators

被引：0

作者：

Chitty-Venkata, Krishna Teja ^{[1
]}

Sastry, Varuni Katti ^{[1
]}

Emani, Murali ^{[1
]}

Vishwanath, Venkatram ^{[1
]}

Shanmugavelu, Sanjif ^{[2
]}

Howland, Sylvia ^{[3
]}

机构：

[1] Argonne Natl Lab, Lemont, IL 60439 USA

[2] Groq Inc, Mountain View, CA USA

[3] Cerebras Syst, Sunnyvale, CA USA

来源：

EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024 | 2024年 / 14802卷

关键词：

D O I：

10.1007/978-3-031-69766-1_22

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Language Models (LLMs) have shown remarkable performance across various language processing applications. Nevertheless, their extensive computational requirements could hinder their deployment in real-time applications or resource-constrained environments. Pruning is a powerful technique to reduce the model size and make it computationally efficient. In this paper, we propose a structured pruning algorithm, Weight Activation and Gradient (WActiGrad), to obtain smaller LLMs from large pre-trained models. We investigate the level of granularity at which structured pruning techniques can be applied to an LLM and identify the challenges in applying these techniques across different parts of the transformer. Finally, based on these observations, we develop a pruning methodology that is adaptable to various attention and feedforward network modules. We comprehensively assess our WActiGrad method on state-of-the-art LLMs, LLaMA (7B and 13B), LLaMA-2 (7B and 13B), and Mistral-7B models across several language benchmarks for post-pretraining. This approach can prune close to 20% of the original model size without significantly compromising the model validation accuracy. We evaluate the hardware performance of our structurally pruned LLMs on different AI accelerators such as Nvidia A100 GPU, Groq LPU, Cerebras CS-2, and Graphcore Bow systems to show the effectiveness of the structured pruning technique. The findings presented in this paper offer insights into the integration of structured pruning techniques deployment on AI accelerators.

引用

页码：317 / 331

页数：15

共 50 条

[41] The promise of AI Large Language Models for Epilepsy care
Landais, Raphaelle
Sultan, Mustafa
Thomas, Rhys H.
EPILEPSY & BEHAVIOR, 2024, 154
[42] AI Computing Systems for Large Language Models Training
Zhang, Zhen-Xing
Wen, Yuan-Bo
Lyu, Han-Qi
Liu, Chang
Zhang, Rui
Li, Xia-Qing
Wang, Chao
Du, Zi-Dong
Guo, Qi
Li, Ling
Zhou, Xue-Hai
Chen, Yun-Ji
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2025, 40 (01) : 6 - 41
[43] LAraBench: Benchmarking Arabic AI with Large Language Models
Qatar Computing Research Institute, HBKU, Qatar
不详
arXiv, 1600,
[44] Death by AI: Will large language models diminish Wikipedia?
Wagner, Christian
Jiang, Ling
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2025,
[45] Large language models make AI usable for everyone!
Bause, Fabian
Konstruktion, 2024, 76 (04): : 3 - 5
[46] Large Language Models and Generative AI, Oh My!
Cobb, Peter J.
ADVANCES IN ARCHAEOLOGICAL PRACTICE, 2023, 11 (03): : 363 - 369
[47] Enhancing pretrained language models with structured commonsense knowledge for textual inference
Du, Li
Ding, Xiao
Xiong, Kai
Liu, Ting
Qin, Bing
KNOWLEDGE-BASED SYSTEMS, 2022, 254
[48] Large language models for structured reporting in radiology: comment
Amnuay Kleebayoon
Viroj Wiwanitkit
La radiologia medica, 2023, 128 : 1440 - 1440
[49] SKILL: Structured Knowledge Infusion for Large Language Models
Moiseev, Fedor
Dong, Zhe
Alfonseca, Enrique
Jaggi, Martin
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1581 - 1588
[50] Large language models for structured reporting in radiology: comment
Kleebayoon, Amnuay
Wiwanitkit, Viroj
RADIOLOGIA MEDICA, 2023, 128 (11): : 1440 - 1440

← 1 2 3 4 5 →