An Empirical Investigation of the Role of Pre-training in Lifelong Learning

被引:0
|
作者
Mehta, Sanket Vaibhav [1 ]
Patil, Darshan [2 ]
Chandar, Sarath [3 ]
Strubell, Emma [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Univ Montreal, Mila Quebec AI Inst, Montreal, PQ H3T 1J4, Canada
[3] Ecole Polytech Montreal, Mila Quebec AI Inst, Canada CIFAR AI Chair, Montreal, PQ H3T 1J4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Lifelong Learning; Continual Learning; Pre; -training; Flat Minima; Sharpness;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel data set of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach outperforms several state-of-the-art task-sequential continual learning algorithms across multiple settings, occasionally even without retaining a memory that scales in size with the number of tasks.
引用
收藏
页数:50
相关论文
共 50 条
  • [41] Class incremental learning with self-supervised pre-training and prototype learning
    Liu, Wenzhuo
    Wu, Xin-Jian
    Zhu, Fei
    Yu, Ming-Ming
    Wang, Chuang
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2025, 157
  • [42] Rethinking ImageNet Pre-training
    He, Kaiming
    Girshick, Ross
    Dollar, Piotr
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4917 - 4926
  • [43] Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving
    Wang, Yunpeng
    Zheng, Kunxian
    Tian, Daxin
    Duan, Xuting
    Zhou, Jianshan
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2021, 22 (05) : 673 - 686
  • [44] Photo Pre-Training, But for Sketch
    Ke, L.
    Pang, Kaiyue
    Song, Yi-Zhe
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2754 - 2764
  • [45] Pre-Training to Learn in Context
    Gu, Yuxian
    Dong, Li
    Wei, Furu
    Huang, Minlie
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4849 - 4870
  • [46] THE PRE-TRAINING SELECTION OF TEACHERS
    Barr, A. S.
    Douglas, Lois
    JOURNAL OF EDUCATIONAL RESEARCH, 1934, 28 (02): : 92 - 117
  • [47] Improving Fractal Pre-training
    Anderson, Connor
    Farrell, Ryan
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2412 - 2421
  • [48] Pre-training via Paraphrasing
    Lewis, Mike
    Ghazvininejad, Marjan
    Ghosh, Gargi
    Aghajanyan, Armen
    Wang, Sida
    Zettlemoyer, Luke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [49] Pre-training phenotyping classifiers
    Dligach, Dmitriy
    Afshar, Majid
    Miller, Timothy
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 113 (113)
  • [50] Rethinking Pre-training and Self-training
    Zoph, Barret
    Ghiasi, Golnaz
    Lin, Tsung-Yi
    Cui, Yin
    Liu, Hanxiao
    Cubuk, Ekin D.
    Le, Quoc V.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33