MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter

被引：0

作者：

Hao, Jitai ^{[1
]}

Sun, Weiwei ^{[1
,2
]}

Xin, Xin ^{[1
]}

Meng, Qi ^{[3
]}

Chen, Zhumin ^{[1
]}

Ren, Pengjie ^{[1
]}

Ren, Zhaochun ^{[4
]}

机构：

[1] Shandong Univ, Qingdao, Peoples R China

[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[3] Acad Math & Syst Sci, Beijing, Peoples R China

[4] Leiden Univ, Leiden, Netherlands

来源：

PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS | 2024年

基金：

国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Parameter-Efficient Fine-tuning (PEFT) facilitates the fine-tuning of Large Language Models (LLMs) under limited resources. However, the fine-tuning performance with PEFT on complex, knowledge-intensive tasks is limited due to the constrained model capacity, which originates from the limited number of additional trainable parameters. To overcome this limitation, we introduce a novel mechanism that fine-tunes LLMs with adapters of larger size yet memory-efficient. This is achieved by leveraging the inherent activation sparsity in the Feed-Forward Networks (FFNs) of LLMs and utilizing the larger capacity of Central Processing Unit (CPU) memory compared to Graphics Processing Unit (GPU). We store and update the parameters of larger adapters on the CPU. Moreover, we employ a Mixture of Experts (MoE)-like architecture to mitigate unnecessary CPU computations and reduce the communication volume between the GPU and CPU. This is particularly beneficial over the limited bandwidth of PCI Express (PCIe). Our method can achieve fine-tuning results comparable to those obtained with larger memory capacities, even when operating under more limited resources such as a 24GB memory single GPU setup, with acceptable loss in training efficiency. Our codes are available at https://github.com/CURRENTF/MEFT.

引用

页码：2375 / 2388

页数：14

共 50 条

[11] A Memory-efficient Sparse Direct Solver with Applications in CEM
Moshfegh, Javad
Vouvakis, Marinos N.
2017 IEEE INTERNATIONAL SYMPOSIUM ON ANTENNAS AND PROPAGATION & USNC/URSI NATIONAL RADIO SCIENCE MEETING, 2017, : 1577 - 1578
[12] FashionGPT: LLM instruction fine-tuning with multiple LoRA-adapter fusion
Gao, Dehong
Ma, Yufei
Liu, Sen
Song, Mengfei
Jin, Linbo
Jiang, Wen
Wang, Xin
Ning, Wei
Yu, Shanqing
Xuan, Qi
Cai, Xiaoyan
Yang, Libin
KNOWLEDGE-BASED SYSTEMS, 2024, 299
[13] Sparse Bitmap Compression for Memory-Efficient Training on the Edge
Hosny, Abdelrahman
Neseem, Marina
Reda, Sherief
2021 ACM/IEEE 6TH SYMPOSIUM ON EDGE COMPUTING (SEC 2021), 2021, : 14 - 25
[14] Memory-Efficient Prompt Tuning for Incremental Histopathology Classification
Zhu, Yu
Li, Kang
Yu, Lequan
Heng, Pheng-Ann
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7802 - 7810
[15] Memory-Efficient Backpropagation Through Time
Gruslys, Audrunas
Munos, Remi
Danihelka, Ivo
Lanctot, Marc
Graves, Alex
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[16] Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Ansell, Alan
Ponti, Edoardo Maria
Korhonen, Anna
Vulic, Ivan
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1778 - 1796
[17] Fine-tuning Image Transformers using Learnable Memory
Sandler, Mark
Zhmoginov, Andrey
Vladymyrov, Max
Jackson, Andrew
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12145 - 12154
[18] How fine can fine-tuning be? Learning efficient language models
Radiya-Dixit, Evani
Wang, Xin
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2435 - 2442
[19] Augmented Neural Fine-Tuning for Efficient Backdoor Purification
Karim, Nazmul
Al Arafat, Abdullah
Khalid, Umar
Guo, Zhishan
Rahnavard, Nazanin
COMPUTER VISION - ECCV 2024, PT LXXX, 2025, 15138 : 401 - 418
[20] Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
Liao, Baohao
Tan, Shaomu
Monz, Christof
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →