An Empirical Investigation of the Role of Pre-training in Lifelong Learning

被引:0
|
作者
Mehta, Sanket Vaibhav [1 ]
Patil, Darshan [2 ]
Chandar, Sarath [3 ]
Strubell, Emma [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Univ Montreal, Mila Quebec AI Inst, Montreal, PQ H3T 1J4, Canada
[3] Ecole Polytech Montreal, Mila Quebec AI Inst, Canada CIFAR AI Chair, Montreal, PQ H3T 1J4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Lifelong Learning; Continual Learning; Pre; -training; Flat Minima; Sharpness;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel data set of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach outperforms several state-of-the-art task-sequential continual learning algorithms across multiple settings, occasionally even without retaining a memory that scales in size with the number of tasks.
引用
收藏
页数:50
相关论文
共 50 条
  • [21] Learning Visual Prior via Generative Pre-Training
    Xie, Jinheng
    Ye, Kai
    Li, Yudong
    Li, Yuexiang
    Lin, Kevin Qinghong
    Zheng, Yefeng
    Shen, Linlin
    Shou, Mike Zheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [22] Improving Reinforcement Learning Pre-Training with Variational Dropout
    Blau, Tom
    Ott, Lionel
    Ramos, Fabio
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 4115 - 4122
  • [23] Learning to See before Learning to Act: Visual Pre-training for Manipulation
    Lin Yen-Chen
    Zeng, Andy
    Song, Shuran
    Isola, Phillip
    Lin, Tsung-Yi
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 7286 - 7293
  • [24] PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning
    Liu, Hongbin
    Jia, Jinyuan
    Gong, Neil Zhenqiang
    PROCEEDINGS OF THE 31ST USENIX SECURITY SYMPOSIUM, 2022, : 3629 - 3645
  • [25] Multilingual Molecular Representation Learning via Contrastive Pre-training
    Guo, Zhihui
    Sharma, Pramod
    Martinez, Andy
    Du, Liang
    Abraham, Robin
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3441 - 3453
  • [26] A Contrastive Learning Pre-Training Method for Motif Occupancy Identification
    Lin, Ken
    Quan, Xiongwen
    Yin, Wenya
    Zhang, Han
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (09)
  • [27] LogiGAN: Learning Logical Reasoning via Adversarial Pre-training
    Pi, Xinyu
    Zhong, Wanjun
    Gao, Yan
    Duan, Nan
    Lou, Jian-Guang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [28] Matrix product state pre-training for quantum machine learning
    Dborin, James
    Barratt, Fergus
    Wimalaweera, Vinul
    Wright, Lewis
    Green, Andrew G.
    QUANTUM SCIENCE AND TECHNOLOGY, 2022, 7 (03):
  • [29] Supervised pre-training for improved stability in deep reinforcement learning
    Jang, Sooyoung
    Kim, Hyung-Il
    ICT EXPRESS, 2023, 9 (01): : 51 - 56
  • [30] Vision-Language Pre-Training with Triple Contrastive Learning
    Yang, Jinyu
    Duan, Jiali
    Tran, Son
    Xu, Yi
    Chanda, Sampath
    Chen, Liqun
    Zeng, Belinda
    Chilimbi, Trishul
    Huang, Junzhou
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15650 - 15659