An Empirical Investigation of the Role of Pre-training in Lifelong Learning

被引:0
|
作者
Mehta, Sanket Vaibhav [1 ]
Patil, Darshan [2 ]
Chandar, Sarath [3 ]
Strubell, Emma [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Univ Montreal, Mila Quebec AI Inst, Montreal, PQ H3T 1J4, Canada
[3] Ecole Polytech Montreal, Mila Quebec AI Inst, Canada CIFAR AI Chair, Montreal, PQ H3T 1J4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Lifelong Learning; Continual Learning; Pre; -training; Flat Minima; Sharpness;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel data set of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach outperforms several state-of-the-art task-sequential continual learning algorithms across multiple settings, occasionally even without retaining a memory that scales in size with the number of tasks.
引用
收藏
页数:50
相关论文
共 50 条
  • [1] ELLE: Efficient Lifelong Pre-training for Emerging Data
    Qin, Yujia
    Zhang, Jiajie
    Lin, Yankai
    Liu, Zhiyuan
    Li, Peng
    Sun, Maosong
    Zhou, Jie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2789 - 2810
  • [2] Robot Learning with Sensorimotor Pre-training
    Radosavovic, Ilija
    Shi, Baifeng
    Fu, Letian
    Goldberg, Ken
    Darrell, Trevor
    Malik, Jitendra
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [3] An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training
    Arumae, Kristjan
    Sun, Qing
    Bhatia, Parminder
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4854 - 4864
  • [4] An Empirical Study for Vietnamese Constituency Parsing with Pre-training
    Tuan-Vi Tran
    Xuan-Thien Pham
    Duc-Vu Nguyen
    Kiet Van Nguyen
    Ngan Luu-Thuy Nguyen
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 234 - 239
  • [5] Learning to Sample Replacements for ELECTRA Pre-Training
    Hao, Yaru
    Dong, Li
    Bao, Hangbo
    Xu, Ke
    Wei, Furu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4495 - 4506
  • [6] Meta-Learning to Improve Pre-Training
    Raghu, Aniruddh
    Lorraine, Jonathan
    Kornblith, Simon
    McDermott, Matthew
    Duvenaud, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Robust Pre-Training by Adversarial Contrastive Learning
    Jiang, Ziyu
    Chen, Tianlong
    Chen, Ting
    Wang, Zhangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [8] Multilingual Pre-training with Universal Dependency Learning
    Sun, Kailai
    Li, Zuchao
    Zhao, Hai
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [9] Learning Chemical Rules of Retrosynthesis with Pre-training
    Jiang, Yinjie
    Wei, Ying
    Wu, Fei
    Huang, Zhengxing
    Kuang, Kun
    Wang, Zhihua
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 5113 - 5121
  • [10] Efficient Conditional Pre-training for Transfer Learning
    Chakraborty, Shuvam
    Uzkent, Burak
    Ayush, Kumar
    Tanmay, Kumar
    Sheehan, Evan
    Ermon, Stefano
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4240 - 4249