Hardware-oriented algorithms for softmax and layer normalization of large language models

被引：0

作者：

Li, Wenjie ^{[1
]}

Lyu, Dongxu ^{[1
]}

Wang, Gang ^{[1
]}

Hu, Aokun ^{[1
]}

Xu, Ningyi ^{[1
]}

He, Guanghui ^{[1
,2
,3
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200241, Peoples R China

[2] Shanghai Jiao Tong Univ, Dept Micro Nano Elect, Shanghai 200241, Peoples R China

[3] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, Shanghai 200241, Peoples R China

来源：

SCIENCE CHINA-INFORMATION SCIENCES | 2024年 / 67卷 / 10期

基金：

中国国家自然科学基金;

关键词：

large language model; softmax; layer normalization; hardware architecture; Transformer;

D O I：

10.1007/s11432-024-4137-4

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

While large language models (LLMs) have sparked a new revolution in the field of natural language processing (NLP), their hardware accelerators have garnered tremendous attention. However, softmax and layer normalization which are the most common non-linear operations in LLMs are frequently overlooked. This paper presents hardware-oriented algorithms for both softmax and layer normalization of LLMs. We propose an approximate approach to implementing division in softmax and extend it for simultaneously computing square root and performing division in layer normalization. It replaces the original computation by multiplication and shifting. For softmax, we further approximate the exponential function by truncating its exponent and then reuse the involved subtraction. For layer normalization, we additionally simplify the computation of denominator by directly removing the term regarding the square of the mean. Furthermore, hardware architectures are developed for the proposed algorithms of softmax and layer normalization. They can work as plug-and-play units for LLM accelerators, requiring no fine-tuning and introducing negligible performance loss. Compared with the state-of-the-art designs, the proposed softmax architecture can save up to 23.45% area cost and 17.39% power consumption, while the proposed layer normalization architecture can save up to 32.70% area cost and 14.29% power consumption.

引用

页数：15

共 50 条

[1] Hardware-oriented algorithms for softmax and layer normalization of large language models
Wenjie LI
Dongxu LYU
Gang WANG
Aokun HU
Ningyi XU
Guanghui HE
Science China(Information Sciences), 2024, 67 (10) : 85 - 99
[2] Simple hardware-oriented algorithms for cellular mobiles positioning
Najiminaini, M.
Doukhnitch, E.
Salamah, M.
2007 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE SERVICES, 2007, : 157 - 160
[3] HARDWARE-ORIENTED ALGORITHMS FOR THE FAST SYMBOLIC CALCULATION OF THE DFT
BETH, T
FUMY, W
ELECTRONICS LETTERS, 1983, 19 (21) : 901 - 902
[4] Hardware-oriented simplifications of the prediction algorithms in the H.265/HEVC encoder
Trochimiuk, Maciej
Abramowski, Andrzej
PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2014, 2014, 9290
[5] Hardware-oriented models for VLSI implementation of self-organizing maps
MartindelBrio, B
BlascoAlberto, J
FROM NATURAL TO ARTIFICIAL NEURAL COMPUTATION, 1995, 930 : 712 - 719
[6] FPGA implementation of hardware-oriented reaction-diffusion cellular automata models
Ishimura, Kazuyoshi
Komuro, Katsuro
Schmid, Alexandre
Asai, Tetsuya
Motomura, Masato
IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2015, 6 (02): : 252 - 262
[7] Novel hardware-oriented integer motion estimation algorithms for high efficiency video coding
Nguyen Vu Thang
Vu Dac Tung
Nguyen Duc Hoan
Dao Ba Anh
PROCEEDINGS OF 2019 6TH NATIONAL FOUNDATION FOR SCIENCE AND TECHNOLOGY DEVELOPMENT (NAFOSTED) CONFERENCE ON INFORMATION AND COMPUTER SCIENCE (NICS), 2019, : 160 - 165
[8] Hardware-oriented pruning and quantization of Deep Learning models to detect life-threatening arrhythmias
Gonzalez-Carabarin, Lizeth
Schmid, Alexandre
Van Sloun, Ruud J. G.
2021 IEEE BIOMEDICAL CIRCUITS AND SYSTEMS CONFERENCE (IEEE BIOCAS 2021), 2021,
[9] A Survey on Hardware Accelerators for Large Language Models
Kachris, Christoforos
APPLIED SCIENCES-BASEL, 2025, 15 (02):
[10] Hardware-oriented optimization of Bloom filter algorithms and architectures for ultra-high-speed lookups in network applications
Sateesan, Arish
Vliegen, Jo
Daemen, Joan
Mentens, Nele
MICROPROCESSORS AND MICROSYSTEMS, 2022, 93

← 1 2 3 4 5 →