Continual pre-training mitigates forgetting in language and vision

被引:0
|
作者
Cossu, Andrea [1 ]
Carta, Antonio [1 ]
Passaro, Lucia [1 ]
Lomonaco, Vincenzo [1 ]
Tuytelaars, Tinne [2 ]
Bacciu, Davide [1 ]
机构
[1] Univ Pisa, Largo B Pontecorvo 3, I-56127 Pisa, Italy
[2] Katholieke Univ Leuven, Kasteelpk Arenberg 10, B-3001 Leuven, Belgium
基金
欧盟地平线“2020”;
关键词
Continual-learning; Lifelong-learning; Pre-training; Self-supervised; Forgetting;
D O I
10.1016/j.neunet.2024.106492
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained models are commonly used in Continual Learning to initialize the model before training on the stream of non-stationary data. However, pre-training is rarely applied during Continual Learning. We investigate the characteristics of the Continual Pre-Training scenario, where a model is continually pre-trained on a stream of incoming data and only later fine-tuned to different downstream tasks. We introduce an evaluation protocol for Continual Pre-Training which monitors forgetting against a Forgetting Control dataset not present in the continual stream. We disentangle the impact on forgetting of 3 main factors: the input modality (NLP, Vision), the architecture type (Transformer, ResNet) and the pre-training protocol (supervised, self-supervised). Moreover, we propose a Sample-Efficient Pre-training method (SEP) that speeds up the pre- training phase. We show that the pre-training protocol is the most important factor accounting for forgetting. Surprisingly, we discovered that self-supervised continual pre-training in both NLP and Vision is sufficient to mitigate forgetting without the use of any Continual Learning strategy. Other factors, like model depth, input modality and architecture type are not as crucial.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Weakly Supervised Vision-and-Language Pre-training with Relative Representations
    Chen, Chi
    Li, Peng
    Sun, Maosong
    Liu, Yang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 8341 - 8355
  • [32] Unsupervised Domain Adaption Harnessing Vision-Language Pre-Training
    Zhou, Wenlve
    Zhou, Zhiheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8201 - 8214
  • [33] Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability
    Chang, Tyler A.
    Tu, Zhuowen
    Bergen, Benjamin K.
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1346 - 1362
  • [34] Multimodal Pre-training Method for Vision-language Understanding and Generation
    Liu T.-Y.
    Wu Z.-X.
    Chen J.-J.
    Jiang Y.-G.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2024 - 2034
  • [35] Unified Vision-Language Pre-Training for Image Captioning and VQA
    Zhou, Luowei
    Palangi, Hamid
    Zhang, Lei
    Hu, Houdong
    Corso, Jason J.
    Gao, Jianfeng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13041 - 13049
  • [36] Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-training
    Chen, Xiaofei
    He, Yuting
    Xue, Cheng
    Ge, Rongjun
    Li, Shuo
    Yang, Guanyu
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT I, 2023, 14220 : 405 - 415
  • [37] Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-training
    Zhang, Wenyu
    Shen, Li
    Foo, Chuan-Sheng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 844 - 866
  • [38] Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
    Dou, Zi-Yi
    Kamath, Aishwarya
    Gan, Zhe
    Zhang, Pengchuan
    Wang, Jianfeng
    Li, Linjie
    Liu, Zicheng
    Liu, Ce
    LeCun, Yann
    Peng, Nanyun
    Gao, Jianfeng
    Wang, Lijuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [39] Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends
    Gan, Zhe
    Li, Linjie
    Li, Chunyuan
    Wang, Lijuan
    Liu, Zicheng
    Gao, Jianfeng
    FOUNDATIONS AND TRENDS IN COMPUTER GRAPHICS AND VISION, 2022, 14 (3-4): : 163 - 352
  • [40] Knowledge Enhanced Pre-Training Model for Vision-Language-Navigation Task
    HUANG Jitao
    ZENG Guohui
    HUANG Bo
    GAO Yongbin
    LIU Jin
    SHI Zhicai
    Wuhan University Journal of Natural Sciences, 2021, 26 (02) : 147 - 155