Recyclable Tuning for Continual Pre-training

被引：0

作者：

Qin, Yujia ^{[1
]}

Qian, Cheng ^{[1
]}

Han, Xu ^{[1
]}

Lin, Yankai ^{[2
]}

Wang, Huadong ^{[1
]}

Xie, Ruobing ^{[3
]}

Li, Zhiyuan ^{[1
]}

Sun, Maosong ^{[1
]}

Zhou, Jie ^{[3
]}

机构：

[1] Tsinghua Univ, BNRIST, IAI, NLP Grp,DCST, Beijing, Peoples R China

[2] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China

[3] Tencent Inc, WeChat AI, Pattern Recognit Ctr, Beijing, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年

基金：

国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Continual pre-training is the paradigm where pre-trained language models (PLMs) continually acquire fresh knowledge from growing data and gradually get upgraded. Before an upgraded PLM is released, we may have tuned the original PLM for various tasks and stored the adapted weights. However, when tuning the upgraded PLM, these outdated adapted weights will typically be ignored and discarded, causing a potential waste of resources. We bring this issue to the forefront and contend that proper algorithms for recycling outdated adapted weights should be developed. To this end, we formulate the task of recyclable tuning for continual pre-training. In pilot studies, we find that after continual pre-training, the upgraded PLM remains compatible with the outdated adapted weights to some extent. Motivated by this finding, we analyze the connection between continually pre-trained PLMs from two novel aspects, i.e., mode connectivity, and functional similarity. Based on the corresponding findings, we propose both an initialization-based method and a distillation-based method for our task. We demonstrate their feasibility in improving the convergence and performance for tuning the upgraded PLM. We also show that both methods can be combined to achieve better performance. The source codes are publicly available at https://github.com/thunlp/RecyclableTuning.

引用

页码：11403 / 11426

页数：24

共 50 条

[41] KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION
Sun, Hao
Tan, Xu
Gan, Jun-Wei
Zhao, Sheng
Han, Dongxu
Liu, Hongzhi
Qin, Tao
Liu, Tie-Yan
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 168 - 175
[42] Pre-training Fine-tuning data Enhancement method based on active learning
Cao, Deqi
Ding, Zhaoyun
Wang, Fei
Ma, Haoyang
2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1447 - 1454
[43] Pre-Training Without Natural Images
Kataoka, Hirokatsu
Okayasu, Kazushige
Matsumoto, Asato
Yamagata, Eisuke
Yamada, Ryosuke
Inoue, Nakamasa
Nakamura, Akio
Satoh, Yutaka
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (04) : 990 - 1007
[44] Dialogue-oriented Pre-training
Xu, Yi
Zhao, Hai
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2663 - 2673
[45] A Pipelined Pre-training Algorithm for DBNs
Ma, Zhiqiang
Li, Tuya
Yang, Shuangtao
Zhang, Li
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 48 - 59
[46] Improving fault localization with pre-training
Zhang, Zhuo
Li, Ya
Xue, Jianxin
Mao, Xiaoguang
FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (01)
[47] Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction
Peng Su
K. Vijay-Shanker
BMC Bioinformatics, 23
[48] Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction
Su, Peng
Vijay-Shanker, K.
BMC BIOINFORMATICS, 2022, 23 (01)
[49] Simulated SAR for ATR pre-training
Willis, Christopher J.
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN DEFENSE APPLICATIONS III, 2021, 11870
[50] Robot Learning with Sensorimotor Pre-training
Radosavovic, Ilija
Shi, Baifeng
Fu, Letian
Goldberg, Ken
Darrell, Trevor
Malik, Jitendra
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229

← 1 2 3 4 5 →