Data Augmented Flatness-aware Gradient Projection for Continual Learning

被引:5
|
作者
Yang, Enneng [1 ]
Shen, Li [2 ]
Wang, Zhenyi [3 ]
Liu, Shiwei [4 ]
Guo, Guibing [1 ]
Wang, Xingwei [1 ]
机构
[1] Northeastern Univ, Shenyang, Peoples R China
[2] JD Explore Acad, Beijing, Peoples R China
[3] Univ Maryland, Baltimore, MD 21201 USA
[4] Univ Texas Austin, Austin, TX 78712 USA
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV51070.2023.00518
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of continual learning (CL) is to continuously learn new tasks without forgetting previously learned old tasks. To alleviate catastrophic forgetting, gradient projection based CL methods require that the gradient updates of new tasks are orthogonal to the subspace spanned by old tasks. This limits the learning process and leads to poor performance on the new task due to the projection constraint being too strong. In this paper, we first revisit the gradient projection method from the perspective of flatness of loss surface, and find that unflatness of the loss surface leads to catastrophic forgetting of the old tasks when the projection constraint is reduced to improve the performance of new tasks. Based on our findings, we propose a Data Augmented Flatness-aware Gradient Projection (DFGP) method to solve the problem, which consists of three modules: data and weight perturbation, flatness-aware optimization, and gradient projection. Specifically, we first perform a flatness-aware perturbation on the task data and current weights to find the case that makes the task loss worst. Next, flatnessaware optimization optimizes both the loss and the flatness of the loss surface on raw and worst-case perturbed data to obtain a flatness-aware gradient. Finally, gradient projection updates the network with the flatness-aware gradient along directions orthogonal to the subspace of the old tasks. Extensive experiments on four datasets show that our method improves the flatness of loss surface and the performance of new tasks, and achieves state-of-the-art (SOTA) performance in the average accuracy of all tasks.
引用
收藏
页码:5607 / 5616
页数:10
相关论文
共 50 条
  • [31] Continual learning-based trajectory prediction with memory augmented networks
    Yang, Biao
    Fan, Fucheng
    Ni, Rongrong
    Li, Jie
    Kiong, Loochu
    Liu, Xiaofeng
    KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [32] Principal Gradient Direction and Confidence Reservoir Sampling for Continual Learning
    Chen, Zhiyi
    Lin, Tong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 : 421 - 432
  • [33] Repeated Augmented Rehearsal: A Simple but Strong Baseline for Online Continual Learning
    Zhang, Yaqian
    Pfahringer, Bernhard
    Frank, Eibe
    Bifet, Albert
    Lim, Nick Jin Sean
    Jia, Yunzhe
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [34] Gradient Regularization with Multivariate Distribution of Previous Knowledge for Continual Learning
    Kim, Tae-Heon
    Moon, Hyung-Jun
    Cho, Sung-Bae
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2022, 2022, 13756 : 359 - 368
  • [35] GopGAN: Gradients Orthogonal Projection Generative Adversarial Network With Continual Learning
    Li, Xiaobin
    Wang, Weiqiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (01) : 215 - 227
  • [36] Online Continual Learning from Imbalanced Data
    Chrysakis, Aristotelis
    Moens, Marie-Francine
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [37] Online Continual Learning from Imbalanced Data
    Chrysakis, Aristotelis
    Moens, Marie-Francine
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [38] Deep continual hashing with gradient-aware memory for cross-modal retrieval
    Song, Ge
    Tan, Xiaoyang
    Yang, Ming
    PATTERN RECOGNITION, 2023, 137
  • [39] Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization
    Zhang, Xingxuan
    Xu, Renzhe
    Yu, Han
    Zou, Hao
    Cui, Peng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 20247 - 20257
  • [40] Age-Aware Data Selection and Aggregator Placement for Timely Federated Continual Learning in Mobile Edge Computing
    Xu, Zichuan
    Wang, Lin
    Liang, Weifa
    Xia, Qiufen
    Xu, Wenzheng
    Zhou, Pan
    Rana, Omer F.
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (02) : 466 - 480