Forward Learning of Large Language Models by Consumer Devices

被引:3
|
作者
Pau, Danilo Pietro [1 ]
Aymone, Fabrizio Maria [1 ]
机构
[1] STMicroelectronics, Syst Res & Applicat, Via C Olivetti 2, I-20864 Agrate Brianza, Italy
关键词
on-device learning; backpropagation; forward learning; PEPITA; MEMPEPITA; Large Language Models; Natural Language Processing;
D O I
10.3390/electronics13020402
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large Language Models achieve state of art performances on a broad variety of Natural Language Processing tasks. In the pervasive IoT era, their deployment on edge devices is more compelling than ever. However, their gigantic model footprint has hindered on-device learning applications which enable AI models to continuously learn and adapt to changes over time. Back-propagation, in use by the majority of deep learning frameworks, is computationally intensive and requires storing intermediate activations into memory to cope with the model's weights update. Recently, "Forward-only algorithms" have been proposed since they are biologically plausible alternatives. By applying more "forward" passes, this class of algorithms can achieve memory reductions with respect to more naive forward-only approaches and by removing the need to store intermediate activations. This comes at the expense of increased computational complexity. This paper considered three Large Language Model: DistilBERT, GPT-3 Small and AlexaTM. It investigated quantitatively any improvements about memory usage and computational complexity brought by known approaches named PEPITA and MEMPEPITA with respect to backpropagation. For low number of tokens in context, and depending on the model, PEPITA increases marginally or reduces substantially arithmetic operations. On the other hand, for large number of tokens in context, PEPITA reduces computational complexity by 30% to 50%. MEMPEPITA increases PEPITA's complexity by one third. About memory, PEPITA and backpropagation, require a comparable amount of memory to store activations, while MEMPEPITA reduces it by 50% to 94% with the benefits being more evident for architectures with a long sequence of blocks. In various real case scenarios, MEMPEPITA's memory reduction was essential for meeting the tight memory requirements of 128 MB equipped edge consumer devices, which are commonly available as smartphone and industrial application multi processors.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Low-Parameter Federated Learning with Large Language Models
    Jiang, Jingang
    Jiang, Haiqi
    Ma, Yuhan
    Liu, Xiangyang
    Fan, Chenyou
    WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 319 - 330
  • [22] Multimodal large language models for inclusive collaboration learning tasks
    Lewis, Armanda
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 202 - 210
  • [23] Improving generalization in large language models by learning prefix subspaces
    Falissard, Louis
    Guigue, Vincent
    Soulier, Laure
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11474 - 11483
  • [24] Learning to Retrieve In-Context Examples for Large Language Models
    Wang, Liang
    Yang, Nan
    Wei, Furu
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1752 - 1767
  • [25] UniCode: Learning a Unified Codebook for Multimodal Large Language Models
    Zheng, Sipeng
    Zhou, Bohan
    Feng, Yicheng
    Wang, Ye
    Lu, Zongqing
    COMPUTER VISION - ECCV 2024, PT VIII, 2025, 15066 : 426 - 443
  • [26] Balancing Privacy and Robustness in Prompt Learning for Large Language Models
    Shi, Chiyu
    Su, Junyu
    Chu, Chiawei
    Wang, Baoping
    Feng, Duanyang
    MATHEMATICS, 2024, 12 (21)
  • [27] Adaptive In-Context Learning with Large Language Models for Bundle
    Sun, Zhu
    Feng, Kaidong
    Yang, Jie
    Qu, Xinghua
    Fang, Hui
    Ong, Yew-Soon
    Liu, Wenyuan
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 966 - 976
  • [28] LLM-FIN: Large Language Models Fingerprinting Attack on Edge Devices
    Nazari, Najmeh
    Xiang, Furi
    Fang, Chongzhou
    Makrani, Hosein Mohammadi
    Puri, Aditya
    Patwari, Kartik
    Sayadi, Hossein
    Rafatirad, Setareh
    Chuah, Chen-Nee
    Homayoun, Houman
    2024 25TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, ISQED 2024, 2024,
  • [29] Large Language Models are Not Models of Natural Language: They are Corpus Models
    Veres, Csaba
    IEEE ACCESS, 2022, 10 : 61970 - 61979
  • [30] Large Language Models
    Vargas, Diego Collarana
    Katsamanis, Nassos
    ERCIM NEWS, 2024, (136): : 12 - 13