Recyclable Tuning for Continual Pre-training

被引：0

作者：

Qin, Yujia ^{[1
]}

Qian, Cheng ^{[1
]}

Han, Xu ^{[1
]}

Lin, Yankai ^{[2
]}

Wang, Huadong ^{[1
]}

Xie, Ruobing ^{[3
]}

Li, Zhiyuan ^{[1
]}

Sun, Maosong ^{[1
]}

Zhou, Jie ^{[3
]}

机构：

[1] Tsinghua Univ, BNRIST, IAI, NLP Grp,DCST, Beijing, Peoples R China

[2] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China

[3] Tencent Inc, WeChat AI, Pattern Recognit Ctr, Beijing, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年

基金：

国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Continual pre-training is the paradigm where pre-trained language models (PLMs) continually acquire fresh knowledge from growing data and gradually get upgraded. Before an upgraded PLM is released, we may have tuned the original PLM for various tasks and stored the adapted weights. However, when tuning the upgraded PLM, these outdated adapted weights will typically be ignored and discarded, causing a potential waste of resources. We bring this issue to the forefront and contend that proper algorithms for recycling outdated adapted weights should be developed. To this end, we formulate the task of recyclable tuning for continual pre-training. In pilot studies, we find that after continual pre-training, the upgraded PLM remains compatible with the outdated adapted weights to some extent. Motivated by this finding, we analyze the connection between continually pre-trained PLMs from two novel aspects, i.e., mode connectivity, and functional similarity. Based on the corresponding findings, we propose both an initialization-based method and a distillation-based method for our task. We demonstrate their feasibility in improving the convergence and performance for tuning the upgraded PLM. We also show that both methods can be combined to achieve better performance. The source codes are publicly available at https://github.com/thunlp/RecyclableTuning.

引用

页码：11403 / 11426

页数：24

共 50 条

[21] Rethinking ImageNet Pre-training
He, Kaiming
Girshick, Ross
Dollar, Piotr
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4917 - 4926
[22] Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural Machine Translation
Liu, Zihan
Winata, Genta Indra
Fung, Pascale
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2706 - 2718
[23] Photo Pre-Training, But for Sketch
Ke, L.
Pang, Kaiyue
Song, Yi-Zhe
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2754 - 2764
[24] Pre-Training to Learn in Context
Gu, Yuxian
Dong, Li
Wei, Furu
Huang, Minlie
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4849 - 4870
[25] Pre-training via Paraphrasing
Lewis, Mike
Ghazvininejad, Marjan
Ghosh, Gargi
Aghajanyan, Armen
Wang, Sida
Zettlemoyer, Luke
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[26] THE PRE-TRAINING SELECTION OF TEACHERS
Barr, A. S.
Douglas, Lois
JOURNAL OF EDUCATIONAL RESEARCH, 1934, 28 (02): : 92 - 117
[27] Improving Fractal Pre-training
Anderson, Connor
Farrell, Ryan
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2412 - 2421
[28] Pre-training phenotyping classifiers
Dligach, Dmitriy
Afshar, Majid
Miller, Timothy
JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 113 (113)
[29] Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning
Chen, Tianlong
Liu, Sijia
Chang, Shiyu
Cheng, Yu
Amini, Lisa
Wang, Zhangyang
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 696 - 705
[30] Protected Health Information Recognition by Fine-Tuning a Pre-training Transformer Mode
Oh, Seo Hyun
Kang, Min
Lee, Youngho
HEALTHCARE INFORMATICS RESEARCH, 2022, 28 (01) : 16 - 24

← 1 2 3 4 5 →