Forward Learning of Large Language Models by Consumer Devices

被引:3
|
作者
Pau, Danilo Pietro [1 ]
Aymone, Fabrizio Maria [1 ]
机构
[1] STMicroelectronics, Syst Res & Applicat, Via C Olivetti 2, I-20864 Agrate Brianza, Italy
关键词
on-device learning; backpropagation; forward learning; PEPITA; MEMPEPITA; Large Language Models; Natural Language Processing;
D O I
10.3390/electronics13020402
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large Language Models achieve state of art performances on a broad variety of Natural Language Processing tasks. In the pervasive IoT era, their deployment on edge devices is more compelling than ever. However, their gigantic model footprint has hindered on-device learning applications which enable AI models to continuously learn and adapt to changes over time. Back-propagation, in use by the majority of deep learning frameworks, is computationally intensive and requires storing intermediate activations into memory to cope with the model's weights update. Recently, "Forward-only algorithms" have been proposed since they are biologically plausible alternatives. By applying more "forward" passes, this class of algorithms can achieve memory reductions with respect to more naive forward-only approaches and by removing the need to store intermediate activations. This comes at the expense of increased computational complexity. This paper considered three Large Language Model: DistilBERT, GPT-3 Small and AlexaTM. It investigated quantitatively any improvements about memory usage and computational complexity brought by known approaches named PEPITA and MEMPEPITA with respect to backpropagation. For low number of tokens in context, and depending on the model, PEPITA increases marginally or reduces substantially arithmetic operations. On the other hand, for large number of tokens in context, PEPITA reduces computational complexity by 30% to 50%. MEMPEPITA increases PEPITA's complexity by one third. About memory, PEPITA and backpropagation, require a comparable amount of memory to store activations, while MEMPEPITA reduces it by 50% to 94% with the benefits being more evident for architectures with a long sequence of blocks. In various real case scenarios, MEMPEPITA's memory reduction was essential for meeting the tight memory requirements of 128 MB equipped edge consumer devices, which are commonly available as smartphone and industrial application multi processors.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Consumer segmentation with large language models
    Li, Yinan
    Liu, Ying
    Yu, Muran
    JOURNAL OF RETAILING AND CONSUMER SERVICES, 2025, 82
  • [2] Evaluating Large Language Learning Models' Accuracy and Reliability in Addressing Consumer Health Queries
    Chung, Sunny
    Koos, Jessica
    JOURNAL OF CONSUMER HEALTH ON THE INTERNET, 2024, 28 (04) : 395 - 402
  • [3] The path forward for large language models in medicine is open
    Riedemann, Lars
    Labonne, Maxime
    Gilbert, Stephen
    NPJ DIGITAL MEDICINE, 2024, 7 (01):
  • [4] Large Language Models in Academic Plastic Surgery: The Way Forward
    ElHawary, Hassan
    Gorgy, Andrew
    Janis, Jeffrey E.
    PLASTIC AND RECONSTRUCTIVE SURGERY-GLOBAL OPEN, 2023, 11 (04)
  • [5] Federated and edge learning for large language models
    Piccialli, Francesco
    Chiaro, Diletta
    Qi, Pian
    Bellandi, Valerio
    Damiani, Ernesto
    INFORMATION FUSION, 2025, 117
  • [6] Tool learning with large language models: a survey
    Qu, Changle
    Dai, Sunhao
    Wei, Xiaochi
    Cai, Hengyi
    Wang, Shuaiqiang
    Yin, Dawei
    Xu, Jun
    Wen, Ji-rong
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (08)
  • [7] An Investigation of Applying Large Language Models to Spoken Language Learning
    Gao, Yingming
    Nuchged, Baorian
    Li, Ya
    Peng, Linkai
    APPLIED SCIENCES-BASEL, 2024, 14 (01):
  • [8] Large Language Models Demonstrate the Potential of Statistical Learning in Language
    Contreras Kallens, Pablo
    Kristensen-McLachlan, Ross Deans
    Christiansen, Morten H.
    COGNITIVE SCIENCE, 2023, 47 (03) : e13256
  • [9] Shortcut Learning of Large Language Models in Natural Language Understanding
    Du, Mengnan
    He, Fengxiang
    Zou, Na
    Tao, Dacheng
    Hu, Xia
    COMMUNICATIONS OF THE ACM, 2024, 67 (01) : 110 - 120
  • [10] Large Language Models on Mobile Devices: Measurements, Analysis, and Insights
    Li, Xiang
    Lu, Zhenyan
    Cai, Dongqi
    Ma, Xiao
    Xu, Mengwei
    PROCEEDINGS OF THE 2024 WORKSHOP ON EDGE AND MOBILE FOUNDATION MODELS, EDGEFM 2024, 2024, : 1 - 6