Recyclable Tuning for Continual Pre-training

被引：0

作者：

Qin, Yujia ^{[1
]}

Qian, Cheng ^{[1
]}

Han, Xu ^{[1
]}

Lin, Yankai ^{[2
]}

Wang, Huadong ^{[1
]}

Xie, Ruobing ^{[3
]}

Li, Zhiyuan ^{[1
]}

Sun, Maosong ^{[1
]}

Zhou, Jie ^{[3
]}

机构：

[1] Tsinghua Univ, BNRIST, IAI, NLP Grp,DCST, Beijing, Peoples R China

[2] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China

[3] Tencent Inc, WeChat AI, Pattern Recognit Ctr, Beijing, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年

基金：

国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Continual pre-training is the paradigm where pre-trained language models (PLMs) continually acquire fresh knowledge from growing data and gradually get upgraded. Before an upgraded PLM is released, we may have tuned the original PLM for various tasks and stored the adapted weights. However, when tuning the upgraded PLM, these outdated adapted weights will typically be ignored and discarded, causing a potential waste of resources. We bring this issue to the forefront and contend that proper algorithms for recycling outdated adapted weights should be developed. To this end, we formulate the task of recyclable tuning for continual pre-training. In pilot studies, we find that after continual pre-training, the upgraded PLM remains compatible with the outdated adapted weights to some extent. Motivated by this finding, we analyze the connection between continually pre-trained PLMs from two novel aspects, i.e., mode connectivity, and functional similarity. Based on the corresponding findings, we propose both an initialization-based method and a distillation-based method for our task. We demonstrate their feasibility in improving the convergence and performance for tuning the upgraded PLM. We also show that both methods can be combined to achieve better performance. The source codes are publicly available at https://github.com/thunlp/RecyclableTuning.

引用

页码：11403 / 11426

页数：24

共 50 条

[31] Pre-Training and Fine-Tuning with Next Sentence Prediction for Multimodal Entity Linking
Li, Lu
Wang, Qipeng
Zhao, Baohua
Li, Xinwei
Zhou, Aihua
Wu, Hanqian
ELECTRONICS, 2022, 11 (14)
[32] SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
Thangarasa, Vithursan
Gupta, Abhay
Marshall, William
Li, Tianda
Leong, Kevin
DeCoste, Dennis
Lie, Sean
Saxena, Shreyas
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2134 - 2146
[33] Fine-tuning and multilingual pre-training for abstractive summarization task for the Arabic language
Kahla, Mram
Novak, Attila
Yang, Zijian Gyozo
ANNALES MATHEMATICAE ET INFORMATICAE, 2023, 57 : 24 - 35
[34] MISS: A Generative Pre-training and Fine-Tuning Approach for Med-VQA
Chen, Jiawei
Yang, Dingkang
Jiang, Yue
Lei, Yuxuan
Zhang, Lihua
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VIII, 2024, 15023 : 299 - 313
[35] Deep-Learning-Based Pre-Training and Refined Tuning for Web Summarization Software
Liu, Mingyue
Ma, Zhe
Li, Jiale
Wu, Ying Cheng
Wang, Xukang
IEEE ACCESS, 2024, 12 : 92120 - 92129
[36] FACTPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization
Wan, David
Bansal, Mohit
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1010 - 1028
[37] Medical text classification based on the discriminative pre-training model and prompt-tuning
Wang, Yu
Wang, Yuan
Peng, Zhenwan
Zhang, Feifan
Zhou, Luyao
Yang, Fei
DIGITAL HEALTH, 2023, 9
[38] Evaluation of pre-training impact on fine-tuning for remote sensing scene classification
Yuan, Man
Liu, Zhi
Wang, Fan
REMOTE SENSING LETTERS, 2019, 10 (01) : 49 - 58
[39] Rethinking Pre-training and Self-training
Zoph, Barret
Ghiasi, Golnaz
Lin, Tsung-Yi
Cui, Yin
Liu, Hanxiao
Cubuk, Ekin D.
Le, Quoc V.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[40] Fine-tuning and multilingual pre-training for abstractive summarization task for the Arabic language
Kahla, Mram
Novak, Attila
Yang, Zijian Gyozo
ANNALES MATHEMATICAE ET INFORMATICAE, 2023, 57 : 24 - 35

← 1 2 3 4 5 →