Continual pre-training mitigates forgetting in language and vision

被引：0

作者：

Cossu, Andrea ^{[1
]}

Carta, Antonio ^{[1
]}

Passaro, Lucia ^{[1
]}

Lomonaco, Vincenzo ^{[1
]}

Tuytelaars, Tinne ^{[2
]}

Bacciu, Davide ^{[1
]}

机构：

[1] Univ Pisa, Largo B Pontecorvo 3, I-56127 Pisa, Italy

[2] Katholieke Univ Leuven, Kasteelpk Arenberg 10, B-3001 Leuven, Belgium

来源：

NEURAL NETWORKS | 2024年 / 179卷

基金：

欧盟地平线“2020”;

关键词：

Continual-learning; Lifelong-learning; Pre-training; Self-supervised; Forgetting;

D O I：

10.1016/j.neunet.2024.106492

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained models are commonly used in Continual Learning to initialize the model before training on the stream of non-stationary data. However, pre-training is rarely applied during Continual Learning. We investigate the characteristics of the Continual Pre-Training scenario, where a model is continually pre-trained on a stream of incoming data and only later fine-tuned to different downstream tasks. We introduce an evaluation protocol for Continual Pre-Training which monitors forgetting against a Forgetting Control dataset not present in the continual stream. We disentangle the impact on forgetting of 3 main factors: the input modality (NLP, Vision), the architecture type (Transformer, ResNet) and the pre-training protocol (supervised, self-supervised). Moreover, we propose a Sample-Efficient Pre-training method (SEP) that speeds up the pre- training phase. We show that the pre-training protocol is the most important factor accounting for forgetting. Surprisingly, we discovered that self-supervised continual pre-training in both NLP and Vision is sufficient to mitigate forgetting without the use of any Continual Learning strategy. Other factors, like model depth, input modality and architecture type are not as crucial.

引用

页数：14

共 50 条

[1] Survey on Vision-language Pre-training
Yin J.
Zhang Z.-D.
Gao Y.-H.
Yang Z.-W.
Li L.
Xiao M.
Sun Y.-Q.
Yan C.-G.
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2000 - 2023
[2] ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding
Sun, Yu
Wang, Shuohuan
Li, Yukun
Feng, Shikun
Tian, Hao
Wu, Hua
Wang, Haifeng
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8968 - 8975
[3] RELATION ENHANCED VISION LANGUAGE PRE-TRAINING
Lee, Ju-Hee
Kang, Je-Won
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2286 - 2290
[4] Continual Pre-Training of Python Language Model to mT5
Kajiura, Teruno
Souma, Nao
Sato, Miyu
Kuramitsu, Kimio
Computer Software, 2023, 40 (04): : 10 - 21
[5] VLP: A Survey on Vision-language Pre-training
Chen, Fei-Long
Zhang, Du-Zhen
Han, Ming-Lun
Chen, Xiu-Yi
Shi, Jing
Xu, Shuang
Xu, Bo
MACHINE INTELLIGENCE RESEARCH, 2023, 20 (01) : 38 - 56
[6] VLP: A Survey on Vision-language Pre-training
Fei-Long Chen
Du-Zhen Zhang
Ming-Lun Han
Xiu-Yi Chen
Jing Shi
Shuang Xu
Bo Xu
Machine Intelligence Research, 2023, 20 (01) : 38 - 56
[7] Recyclable Tuning for Continual Pre-training
Qin, Yujia
Qian, Cheng
Han, Xu
Lin, Yankai
Wang, Huadong
Xie, Ruobing
Li, Zhiyuan
Sun, Maosong
Zhou, Jie
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11403 - 11426
[8] VLP: A Survey on Vision-language Pre-training
Fei-Long Chen
Du-Zhen Zhang
Ming-Lun Han
Xiu-Yi Chen
Jing Shi
Shuang Xu
Bo Xu
Machine Intelligence Research, 2023, 20 : 38 - 56
[9] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Jian, Yiren
Gao, Chongyang
Vosoughi, Soroush
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[10] Simultaneously Training and Compressing Vision-and-Language Pre-Training Model
Qi, Qiaosong
Zhang, Aixi
Liao, Yue
Sun, Wenyu
Wang, Yongliang
Li, Xiaobo
Liu, Si
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8194 - 8203

← 1 2 3 4 5 →