Hardware-oriented algorithms for softmax and layer normalization of large language models

被引:0
|
作者
Li, Wenjie [1 ]
Lyu, Dongxu [1 ]
Wang, Gang [1 ]
Hu, Aokun [1 ]
Xu, Ningyi [1 ]
He, Guanghui [1 ,2 ,3 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200241, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Micro Nano Elect, Shanghai 200241, Peoples R China
[3] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, Shanghai 200241, Peoples R China
基金
中国国家自然科学基金;
关键词
large language model; softmax; layer normalization; hardware architecture; Transformer;
D O I
10.1007/s11432-024-4137-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While large language models (LLMs) have sparked a new revolution in the field of natural language processing (NLP), their hardware accelerators have garnered tremendous attention. However, softmax and layer normalization which are the most common non-linear operations in LLMs are frequently overlooked. This paper presents hardware-oriented algorithms for both softmax and layer normalization of LLMs. We propose an approximate approach to implementing division in softmax and extend it for simultaneously computing square root and performing division in layer normalization. It replaces the original computation by multiplication and shifting. For softmax, we further approximate the exponential function by truncating its exponent and then reuse the involved subtraction. For layer normalization, we additionally simplify the computation of denominator by directly removing the term regarding the square of the mean. Furthermore, hardware architectures are developed for the proposed algorithms of softmax and layer normalization. They can work as plug-and-play units for LLM accelerators, requiring no fine-tuning and introducing negligible performance loss. Compared with the state-of-the-art designs, the proposed softmax architecture can save up to 23.45% area cost and 17.39% power consumption, while the proposed layer normalization architecture can save up to 32.70% area cost and 14.29% power consumption.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Hardware-oriented algorithms for softmax and layer normalization of large language models
    Wenjie LI
    Dongxu LYU
    Gang WANG
    Aokun HU
    Ningyi XU
    Guanghui HE
    Science China(Information Sciences), 2024, 67 (10) : 85 - 99
  • [2] Simple hardware-oriented algorithms for cellular mobiles positioning
    Najiminaini, M.
    Doukhnitch, E.
    Salamah, M.
    2007 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE SERVICES, 2007, : 157 - 160
  • [3] HARDWARE-ORIENTED ALGORITHMS FOR THE FAST SYMBOLIC CALCULATION OF THE DFT
    BETH, T
    FUMY, W
    ELECTRONICS LETTERS, 1983, 19 (21) : 901 - 902
  • [4] Hardware-oriented simplifications of the prediction algorithms in the H.265/HEVC encoder
    Trochimiuk, Maciej
    Abramowski, Andrzej
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2014, 2014, 9290
  • [5] Hardware-oriented models for VLSI implementation of self-organizing maps
    MartindelBrio, B
    BlascoAlberto, J
    FROM NATURAL TO ARTIFICIAL NEURAL COMPUTATION, 1995, 930 : 712 - 719
  • [6] FPGA implementation of hardware-oriented reaction-diffusion cellular automata models
    Ishimura, Kazuyoshi
    Komuro, Katsuro
    Schmid, Alexandre
    Asai, Tetsuya
    Motomura, Masato
    IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2015, 6 (02): : 252 - 262
  • [7] Novel hardware-oriented integer motion estimation algorithms for high efficiency video coding
    Nguyen Vu Thang
    Vu Dac Tung
    Nguyen Duc Hoan
    Dao Ba Anh
    PROCEEDINGS OF 2019 6TH NATIONAL FOUNDATION FOR SCIENCE AND TECHNOLOGY DEVELOPMENT (NAFOSTED) CONFERENCE ON INFORMATION AND COMPUTER SCIENCE (NICS), 2019, : 160 - 165
  • [8] Hardware-oriented pruning and quantization of Deep Learning models to detect life-threatening arrhythmias
    Gonzalez-Carabarin, Lizeth
    Schmid, Alexandre
    Van Sloun, Ruud J. G.
    2021 IEEE BIOMEDICAL CIRCUITS AND SYSTEMS CONFERENCE (IEEE BIOCAS 2021), 2021,
  • [9] A Survey on Hardware Accelerators for Large Language Models
    Kachris, Christoforos
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [10] Hardware-oriented optimization of Bloom filter algorithms and architectures for ultra-high-speed lookups in network applications
    Sateesan, Arish
    Vliegen, Jo
    Daemen, Joan
    Mentens, Nele
    MICROPROCESSORS AND MICROSYSTEMS, 2022, 93