Recyclable Tuning for Continual Pre-training

被引:0
|
作者
Qin, Yujia [1 ]
Qian, Cheng [1 ]
Han, Xu [1 ]
Lin, Yankai [2 ]
Wang, Huadong [1 ]
Xie, Ruobing [3 ]
Li, Zhiyuan [1 ]
Sun, Maosong [1 ]
Zhou, Jie [3 ]
机构
[1] Tsinghua Univ, BNRIST, IAI, NLP Grp,DCST, Beijing, Peoples R China
[2] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
[3] Tencent Inc, WeChat AI, Pattern Recognit Ctr, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Continual pre-training is the paradigm where pre-trained language models (PLMs) continually acquire fresh knowledge from growing data and gradually get upgraded. Before an upgraded PLM is released, we may have tuned the original PLM for various tasks and stored the adapted weights. However, when tuning the upgraded PLM, these outdated adapted weights will typically be ignored and discarded, causing a potential waste of resources. We bring this issue to the forefront and contend that proper algorithms for recycling outdated adapted weights should be developed. To this end, we formulate the task of recyclable tuning for continual pre-training. In pilot studies, we find that after continual pre-training, the upgraded PLM remains compatible with the outdated adapted weights to some extent. Motivated by this finding, we analyze the connection between continually pre-trained PLMs from two novel aspects, i.e., mode connectivity, and functional similarity. Based on the corresponding findings, we propose both an initialization-based method and a distillation-based method for our task. We demonstrate their feasibility in improving the convergence and performance for tuning the upgraded PLM. We also show that both methods can be combined to achieve better performance. The source codes are publicly available at https://github.com/thunlp/RecyclableTuning.
引用
收藏
页码:11403 / 11426
页数:24
相关论文
共 50 条
  • [31] Pre-Training and Fine-Tuning with Next Sentence Prediction for Multimodal Entity Linking
    Li, Lu
    Wang, Qipeng
    Zhao, Baohua
    Li, Xinwei
    Zhou, Aihua
    Wu, Hanqian
    ELECTRONICS, 2022, 11 (14)
  • [32] SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
    Thangarasa, Vithursan
    Gupta, Abhay
    Marshall, William
    Li, Tianda
    Leong, Kevin
    DeCoste, Dennis
    Lie, Sean
    Saxena, Shreyas
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2134 - 2146
  • [33] Fine-tuning and multilingual pre-training for abstractive summarization task for the Arabic language
    Kahla, Mram
    Novak, Attila
    Yang, Zijian Gyozo
    ANNALES MATHEMATICAE ET INFORMATICAE, 2023, 57 : 24 - 35
  • [34] MISS: A Generative Pre-training and Fine-Tuning Approach for Med-VQA
    Chen, Jiawei
    Yang, Dingkang
    Jiang, Yue
    Lei, Yuxuan
    Zhang, Lihua
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VIII, 2024, 15023 : 299 - 313
  • [35] Deep-Learning-Based Pre-Training and Refined Tuning for Web Summarization Software
    Liu, Mingyue
    Ma, Zhe
    Li, Jiale
    Wu, Ying Cheng
    Wang, Xukang
    IEEE ACCESS, 2024, 12 : 92120 - 92129
  • [36] FACTPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization
    Wan, David
    Bansal, Mohit
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1010 - 1028
  • [37] Medical text classification based on the discriminative pre-training model and prompt-tuning
    Wang, Yu
    Wang, Yuan
    Peng, Zhenwan
    Zhang, Feifan
    Zhou, Luyao
    Yang, Fei
    DIGITAL HEALTH, 2023, 9
  • [38] Evaluation of pre-training impact on fine-tuning for remote sensing scene classification
    Yuan, Man
    Liu, Zhi
    Wang, Fan
    REMOTE SENSING LETTERS, 2019, 10 (01) : 49 - 58
  • [39] Rethinking Pre-training and Self-training
    Zoph, Barret
    Ghiasi, Golnaz
    Lin, Tsung-Yi
    Cui, Yin
    Liu, Hanxiao
    Cubuk, Ekin D.
    Le, Quoc V.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [40] Fine-tuning and multilingual pre-training for abstractive summarization task for the Arabic language
    Kahla, Mram
    Novak, Attila
    Yang, Zijian Gyozo
    ANNALES MATHEMATICAE ET INFORMATICAE, 2023, 57 : 24 - 35