Forward Learning of Large Language Models by Consumer Devices

被引：3

作者：

Pau, Danilo Pietro ^{[1
]}

Aymone, Fabrizio Maria ^{[1
]}

机构：

[1] STMicroelectronics, Syst Res & Applicat, Via C Olivetti 2, I-20864 Agrate Brianza, Italy

来源：

ELECTRONICS | 2024年 / 13卷 / 02期

关键词：

on-device learning; backpropagation; forward learning; PEPITA; MEMPEPITA; Large Language Models; Natural Language Processing;

D O I：

10.3390/electronics13020402

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Large Language Models achieve state of art performances on a broad variety of Natural Language Processing tasks. In the pervasive IoT era, their deployment on edge devices is more compelling than ever. However, their gigantic model footprint has hindered on-device learning applications which enable AI models to continuously learn and adapt to changes over time. Back-propagation, in use by the majority of deep learning frameworks, is computationally intensive and requires storing intermediate activations into memory to cope with the model's weights update. Recently, "Forward-only algorithms" have been proposed since they are biologically plausible alternatives. By applying more "forward" passes, this class of algorithms can achieve memory reductions with respect to more naive forward-only approaches and by removing the need to store intermediate activations. This comes at the expense of increased computational complexity. This paper considered three Large Language Model: DistilBERT, GPT-3 Small and AlexaTM. It investigated quantitatively any improvements about memory usage and computational complexity brought by known approaches named PEPITA and MEMPEPITA with respect to backpropagation. For low number of tokens in context, and depending on the model, PEPITA increases marginally or reduces substantially arithmetic operations. On the other hand, for large number of tokens in context, PEPITA reduces computational complexity by 30% to 50%. MEMPEPITA increases PEPITA's complexity by one third. About memory, PEPITA and backpropagation, require a comparable amount of memory to store activations, while MEMPEPITA reduces it by 50% to 94% with the benefits being more evident for architectures with a long sequence of blocks. In various real case scenarios, MEMPEPITA's memory reduction was essential for meeting the tight memory requirements of 128 MB equipped edge consumer devices, which are commonly available as smartphone and industrial application multi processors.

引用

页数：13

共 50 条

[21] Low-Parameter Federated Learning with Large Language Models
Jiang, Jingang
Jiang, Haiqi
Ma, Yuhan
Liu, Xiangyang
Fan, Chenyou
WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 319 - 330
[22] Multimodal large language models for inclusive collaboration learning tasks
Lewis, Armanda
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 202 - 210
[23] Improving generalization in large language models by learning prefix subspaces
Falissard, Louis
Guigue, Vincent
Soulier, Laure
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11474 - 11483
[24] Learning to Retrieve In-Context Examples for Large Language Models
Wang, Liang
Yang, Nan
Wei, Furu
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1752 - 1767
[25] UniCode: Learning a Unified Codebook for Multimodal Large Language Models
Zheng, Sipeng
Zhou, Bohan
Feng, Yicheng
Wang, Ye
Lu, Zongqing
COMPUTER VISION - ECCV 2024, PT VIII, 2025, 15066 : 426 - 443
[26] Balancing Privacy and Robustness in Prompt Learning for Large Language Models
Shi, Chiyu
Su, Junyu
Chu, Chiawei
Wang, Baoping
Feng, Duanyang
MATHEMATICS, 2024, 12 (21)
[27] Adaptive In-Context Learning with Large Language Models for Bundle
Sun, Zhu
Feng, Kaidong
Yang, Jie
Qu, Xinghua
Fang, Hui
Ong, Yew-Soon
Liu, Wenyuan
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 966 - 976
[28] LLM-FIN: Large Language Models Fingerprinting Attack on Edge Devices
Nazari, Najmeh
Xiang, Furi
Fang, Chongzhou
Makrani, Hosein Mohammadi
Puri, Aditya
Patwari, Kartik
Sayadi, Hossein
Rafatirad, Setareh
Chuah, Chen-Nee
Homayoun, Houman
2024 25TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, ISQED 2024, 2024,
[29] Large Language Models are Not Models of Natural Language: They are Corpus Models
Veres, Csaba
IEEE ACCESS, 2022, 10 : 61970 - 61979
[30] Large Language Models
Vargas, Diego Collarana
Katsamanis, Nassos
ERCIM NEWS, 2024, (136): : 12 - 13

← 1 2 3 4 5 →