Recyclable Tuning for Continual Pre-training

被引:0
|
作者
Qin, Yujia [1 ]
Qian, Cheng [1 ]
Han, Xu [1 ]
Lin, Yankai [2 ]
Wang, Huadong [1 ]
Xie, Ruobing [3 ]
Li, Zhiyuan [1 ]
Sun, Maosong [1 ]
Zhou, Jie [3 ]
机构
[1] Tsinghua Univ, BNRIST, IAI, NLP Grp,DCST, Beijing, Peoples R China
[2] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
[3] Tencent Inc, WeChat AI, Pattern Recognit Ctr, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Continual pre-training is the paradigm where pre-trained language models (PLMs) continually acquire fresh knowledge from growing data and gradually get upgraded. Before an upgraded PLM is released, we may have tuned the original PLM for various tasks and stored the adapted weights. However, when tuning the upgraded PLM, these outdated adapted weights will typically be ignored and discarded, causing a potential waste of resources. We bring this issue to the forefront and contend that proper algorithms for recycling outdated adapted weights should be developed. To this end, we formulate the task of recyclable tuning for continual pre-training. In pilot studies, we find that after continual pre-training, the upgraded PLM remains compatible with the outdated adapted weights to some extent. Motivated by this finding, we analyze the connection between continually pre-trained PLMs from two novel aspects, i.e., mode connectivity, and functional similarity. Based on the corresponding findings, we propose both an initialization-based method and a distillation-based method for our task. We demonstrate their feasibility in improving the convergence and performance for tuning the upgraded PLM. We also show that both methods can be combined to achieve better performance. The source codes are publicly available at https://github.com/thunlp/RecyclableTuning.
引用
收藏
页码:11403 / 11426
页数:24
相关论文
共 50 条
  • [21] Rethinking ImageNet Pre-training
    He, Kaiming
    Girshick, Ross
    Dollar, Piotr
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4917 - 4926
  • [22] Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural Machine Translation
    Liu, Zihan
    Winata, Genta Indra
    Fung, Pascale
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2706 - 2718
  • [23] Photo Pre-Training, But for Sketch
    Ke, L.
    Pang, Kaiyue
    Song, Yi-Zhe
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2754 - 2764
  • [24] Pre-Training to Learn in Context
    Gu, Yuxian
    Dong, Li
    Wei, Furu
    Huang, Minlie
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4849 - 4870
  • [25] Pre-training via Paraphrasing
    Lewis, Mike
    Ghazvininejad, Marjan
    Ghosh, Gargi
    Aghajanyan, Armen
    Wang, Sida
    Zettlemoyer, Luke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [26] THE PRE-TRAINING SELECTION OF TEACHERS
    Barr, A. S.
    Douglas, Lois
    JOURNAL OF EDUCATIONAL RESEARCH, 1934, 28 (02): : 92 - 117
  • [27] Improving Fractal Pre-training
    Anderson, Connor
    Farrell, Ryan
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2412 - 2421
  • [28] Pre-training phenotyping classifiers
    Dligach, Dmitriy
    Afshar, Majid
    Miller, Timothy
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 113 (113)
  • [29] Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning
    Chen, Tianlong
    Liu, Sijia
    Chang, Shiyu
    Cheng, Yu
    Amini, Lisa
    Wang, Zhangyang
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 696 - 705
  • [30] Protected Health Information Recognition by Fine-Tuning a Pre-training Transformer Mode
    Oh, Seo Hyun
    Kang, Min
    Lee, Youngho
    HEALTHCARE INFORMATICS RESEARCH, 2022, 28 (01) : 16 - 24