Recyclable Tuning for Continual Pre-training

被引：0

作者：

Qin, Yujia ^{[1
]}

Qian, Cheng ^{[1
]}

Han, Xu ^{[1
]}

Lin, Yankai ^{[2
]}

Wang, Huadong ^{[1
]}

Xie, Ruobing ^{[3
]}

Li, Zhiyuan ^{[1
]}

Sun, Maosong ^{[1
]}

Zhou, Jie ^{[3
]}

机构：

[1] Tsinghua Univ, BNRIST, IAI, NLP Grp,DCST, Beijing, Peoples R China

[2] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China

[3] Tencent Inc, WeChat AI, Pattern Recognit Ctr, Beijing, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年

基金：

国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Continual pre-training is the paradigm where pre-trained language models (PLMs) continually acquire fresh knowledge from growing data and gradually get upgraded. Before an upgraded PLM is released, we may have tuned the original PLM for various tasks and stored the adapted weights. However, when tuning the upgraded PLM, these outdated adapted weights will typically be ignored and discarded, causing a potential waste of resources. We bring this issue to the forefront and contend that proper algorithms for recycling outdated adapted weights should be developed. To this end, we formulate the task of recyclable tuning for continual pre-training. In pilot studies, we find that after continual pre-training, the upgraded PLM remains compatible with the outdated adapted weights to some extent. Motivated by this finding, we analyze the connection between continually pre-trained PLMs from two novel aspects, i.e., mode connectivity, and functional similarity. Based on the corresponding findings, we propose both an initialization-based method and a distillation-based method for our task. We demonstrate their feasibility in improving the convergence and performance for tuning the upgraded PLM. We also show that both methods can be combined to achieve better performance. The source codes are publicly available at https://github.com/thunlp/RecyclableTuning.

引用

页码：11403 / 11426

页数：24

共 50 条

[1] Continual pre-training mitigates forgetting in language and vision
Cossu, Andrea
Carta, Antonio
Passaro, Lucia
Lomonaco, Vincenzo
Tuytelaars, Tinne
Bacciu, Davide
NEURAL NETWORKS, 2024, 179
[2] ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding
Sun, Yu
Wang, Shuohuan
Li, Yukun
Feng, Shikun
Tian, Hao
Wu, Hua
Wang, Haifeng
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8968 - 8975
[3] Removing Backdoors in Pre-trained Models by Regularized Continual Pre-training
Zhu, Biru
Cui, Ganqu
Chen, Yangyi
Qin, Yujia
Yuan, Lifan
Fu, Chong
Deng, Yangdong
Liu, Zhiyuan
Sun, Maosong
Gu, Ming
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1608 - 1623
[4] Continual Pre-Training of Python Language Model to mT5
Kajiura, Teruno
Souma, Nao
Sato, Miyu
Kuramitsu, Kimio
Computer Software, 2023, 40 (04): : 10 - 21
[5] Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training
Zhang, Haode
Liang, Haowen
Zh, Liming
Lam, Albert Y. S.
Wu, Xiao-Ming
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11105 - 11119
[6] A Continual Pre-training Approach to Tele-Triaging Pregnant Women in Kenya
Zhang, Wenbo
Guo, Hangzhi
Ranganathan, Prerna
Patel, Jay
Rajasekharan, Sathyanath
Danayak, Nidhi
Gupta, Manan
Yadav, Amulya
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 14620 - 14627
[7] QUERT: Continual Pre-training of Language Model for Query Understanding in Travel Domain Search
Xie, Jian
Liang, Yidan
Liu, Jingping
Xiao, Yanghua
Wu, Baohua
Ni, Shenghua
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5282 - 5291
[8] Continual Pre-Training of Language Models for Concept Prerequisite Learning with Graph Neural Networks
Tang, Xin
Liu, Kunjia
Xu, Hao
Xiao, Weidong
Tan, Zhen
MATHEMATICS, 2023, 11 (12)
[9] SAR-HUB: Pre-Training, Fine-Tuning, and Explaining
Yang, Haodong
Kang, Xinyue
Liu, Long
Liu, Yujiang
Huang, Zhongling
REMOTE SENSING, 2023, 15 (23)
[10] PSP: Pre-training and Structure Prompt Tuning for Graph Neural Networks
Ge, Qingqing
Zhao, Zeyuan
Liu, Yiding
Cheng, Anfeng
Li, Xiang
Wang, Shuaiqiang
Yin, Dawei
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT V, ECML PKDD 2024, 2024, 14945 : 423 - 439

← 1 2 3 4 5 →