Recyclable Tuning for Continual Pre-training

被引:0
|
作者
Qin, Yujia [1 ]
Qian, Cheng [1 ]
Han, Xu [1 ]
Lin, Yankai [2 ]
Wang, Huadong [1 ]
Xie, Ruobing [3 ]
Li, Zhiyuan [1 ]
Sun, Maosong [1 ]
Zhou, Jie [3 ]
机构
[1] Tsinghua Univ, BNRIST, IAI, NLP Grp,DCST, Beijing, Peoples R China
[2] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
[3] Tencent Inc, WeChat AI, Pattern Recognit Ctr, Beijing, Peoples R China
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Continual pre-training is the paradigm where pre-trained language models (PLMs) continually acquire fresh knowledge from growing data and gradually get upgraded. Before an upgraded PLM is released, we may have tuned the original PLM for various tasks and stored the adapted weights. However, when tuning the upgraded PLM, these outdated adapted weights will typically be ignored and discarded, causing a potential waste of resources. We bring this issue to the forefront and contend that proper algorithms for recycling outdated adapted weights should be developed. To this end, we formulate the task of recyclable tuning for continual pre-training. In pilot studies, we find that after continual pre-training, the upgraded PLM remains compatible with the outdated adapted weights to some extent. Motivated by this finding, we analyze the connection between continually pre-trained PLMs from two novel aspects, i.e., mode connectivity, and functional similarity. Based on the corresponding findings, we propose both an initialization-based method and a distillation-based method for our task. We demonstrate their feasibility in improving the convergence and performance for tuning the upgraded PLM. We also show that both methods can be combined to achieve better performance. The source codes are publicly available at https://github.com/thunlp/RecyclableTuning.
引用
收藏
页码:11403 / 11426
页数:24
相关论文
共 50 条
  • [41] KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION
    Sun, Hao
    Tan, Xu
    Gan, Jun-Wei
    Zhao, Sheng
    Han, Dongxu
    Liu, Hongzhi
    Qin, Tao
    Liu, Tie-Yan
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 168 - 175
  • [42] Pre-training Fine-tuning data Enhancement method based on active learning
    Cao, Deqi
    Ding, Zhaoyun
    Wang, Fei
    Ma, Haoyang
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1447 - 1454
  • [43] Pre-Training Without Natural Images
    Kataoka, Hirokatsu
    Okayasu, Kazushige
    Matsumoto, Asato
    Yamagata, Eisuke
    Yamada, Ryosuke
    Inoue, Nakamasa
    Nakamura, Akio
    Satoh, Yutaka
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (04) : 990 - 1007
  • [44] Dialogue-oriented Pre-training
    Xu, Yi
    Zhao, Hai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2663 - 2673
  • [45] A Pipelined Pre-training Algorithm for DBNs
    Ma, Zhiqiang
    Li, Tuya
    Yang, Shuangtao
    Zhang, Li
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 48 - 59
  • [46] Improving fault localization with pre-training
    Zhang, Zhuo
    Li, Ya
    Xue, Jianxin
    Mao, Xiaoguang
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (01)
  • [47] Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction
    Peng Su
    K. Vijay-Shanker
    BMC Bioinformatics, 23
  • [48] Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction
    Su, Peng
    Vijay-Shanker, K.
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [49] Simulated SAR for ATR pre-training
    Willis, Christopher J.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN DEFENSE APPLICATIONS III, 2021, 11870
  • [50] Robot Learning with Sensorimotor Pre-training
    Radosavovic, Ilija
    Shi, Baifeng
    Fu, Letian
    Goldberg, Ken
    Darrell, Trevor
    Malik, Jitendra
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229