Recyclable Tuning for Continual Pre-training

被引:0
|
作者
Qin, Yujia [1 ]
Qian, Cheng [1 ]
Han, Xu [1 ]
Lin, Yankai [2 ]
Wang, Huadong [1 ]
Xie, Ruobing [3 ]
Li, Zhiyuan [1 ]
Sun, Maosong [1 ]
Zhou, Jie [3 ]
机构
[1] Tsinghua Univ, BNRIST, IAI, NLP Grp,DCST, Beijing, Peoples R China
[2] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
[3] Tencent Inc, WeChat AI, Pattern Recognit Ctr, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Continual pre-training is the paradigm where pre-trained language models (PLMs) continually acquire fresh knowledge from growing data and gradually get upgraded. Before an upgraded PLM is released, we may have tuned the original PLM for various tasks and stored the adapted weights. However, when tuning the upgraded PLM, these outdated adapted weights will typically be ignored and discarded, causing a potential waste of resources. We bring this issue to the forefront and contend that proper algorithms for recycling outdated adapted weights should be developed. To this end, we formulate the task of recyclable tuning for continual pre-training. In pilot studies, we find that after continual pre-training, the upgraded PLM remains compatible with the outdated adapted weights to some extent. Motivated by this finding, we analyze the connection between continually pre-trained PLMs from two novel aspects, i.e., mode connectivity, and functional similarity. Based on the corresponding findings, we propose both an initialization-based method and a distillation-based method for our task. We demonstrate their feasibility in improving the convergence and performance for tuning the upgraded PLM. We also show that both methods can be combined to achieve better performance. The source codes are publicly available at https://github.com/thunlp/RecyclableTuning.
引用
收藏
页码:11403 / 11426
页数:24
相关论文
共 50 条
  • [1] Continual pre-training mitigates forgetting in language and vision
    Cossu, Andrea
    Carta, Antonio
    Passaro, Lucia
    Lomonaco, Vincenzo
    Tuytelaars, Tinne
    Bacciu, Davide
    NEURAL NETWORKS, 2024, 179
  • [2] ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding
    Sun, Yu
    Wang, Shuohuan
    Li, Yukun
    Feng, Shikun
    Tian, Hao
    Wu, Hua
    Wang, Haifeng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8968 - 8975
  • [3] Removing Backdoors in Pre-trained Models by Regularized Continual Pre-training
    Zhu, Biru
    Cui, Ganqu
    Chen, Yangyi
    Qin, Yujia
    Yuan, Lifan
    Fu, Chong
    Deng, Yangdong
    Liu, Zhiyuan
    Sun, Maosong
    Gu, Ming
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1608 - 1623
  • [4] Continual Pre-Training of Python Language Model to mT5
    Kajiura, Teruno
    Souma, Nao
    Sato, Miyu
    Kuramitsu, Kimio
    Computer Software, 2023, 40 (04): : 10 - 21
  • [5] Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training
    Zhang, Haode
    Liang, Haowen
    Zh, Liming
    Lam, Albert Y. S.
    Wu, Xiao-Ming
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11105 - 11119
  • [6] A Continual Pre-training Approach to Tele-Triaging Pregnant Women in Kenya
    Zhang, Wenbo
    Guo, Hangzhi
    Ranganathan, Prerna
    Patel, Jay
    Rajasekharan, Sathyanath
    Danayak, Nidhi
    Gupta, Manan
    Yadav, Amulya
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 14620 - 14627
  • [7] QUERT: Continual Pre-training of Language Model for Query Understanding in Travel Domain Search
    Xie, Jian
    Liang, Yidan
    Liu, Jingping
    Xiao, Yanghua
    Wu, Baohua
    Ni, Shenghua
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5282 - 5291
  • [8] Continual Pre-Training of Language Models for Concept Prerequisite Learning with Graph Neural Networks
    Tang, Xin
    Liu, Kunjia
    Xu, Hao
    Xiao, Weidong
    Tan, Zhen
    MATHEMATICS, 2023, 11 (12)
  • [9] SAR-HUB: Pre-Training, Fine-Tuning, and Explaining
    Yang, Haodong
    Kang, Xinyue
    Liu, Long
    Liu, Yujiang
    Huang, Zhongling
    REMOTE SENSING, 2023, 15 (23)
  • [10] PSP: Pre-training and Structure Prompt Tuning for Graph Neural Networks
    Ge, Qingqing
    Zhao, Zeyuan
    Liu, Yiding
    Cheng, Anfeng
    Li, Xiang
    Wang, Shuaiqiang
    Yin, Dawei
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT V, ECML PKDD 2024, 2024, 14945 : 423 - 439