Measuring Asymmetric Gradient Discrepancy in Parallel Continual Learning

被引:2
|
作者
Lyu, Fan [1 ]
Sun, Qing [1 ]
Shang, Fanhua [1 ]
Wan, Liang [1 ]
Feng, Wei [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV51070.2023.01048
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Parallel Continual Learning (PCL), the parallel multiple tasks start and end training unpredictably, thus suffering from both training conflict and catastrophic forgetting issues. The two issues are raised because the gradients from parallel tasks differ in directions and magnitudes. Thus, in this paper, we formulate the PCL into a minimum distance optimization problem among gradients and propose an explicit Asymmetric Gradient Distance (AGD) to evaluate the gradient discrepancy in PCL. AGD considers both gradient magnitude ratios and directions, and has a tolerance when updating with a small gradient of inverse direction, which reduces the imbalanced influence of gradients on parallel task training. Moreover, we present a novel Maximum Discrepancy Optimization ( MaxDO) strategy to minimize the maximum discrepancy among multiple gradients. Solving by MaxDO with AGD, parallel training reduces the influence of the training conflict and suppresses the catastrophic forgetting of finished tasks. Extensive experiments validate the effectiveness of our approach on three image recognition datasets in task-incremental and class-incremental PCL. Our code is available at https://github.com/fanlyu/maxdo.
引用
收藏
页码:11377 / 11386
页数:10
相关论文
共 50 条
  • [1] Orthogonal Gradient Descent for Continual Learning
    Farajtabar, Mehrdad
    Azizan, Navid
    Mott, Alex
    Li, Ang
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 3762 - 3772
  • [2] Gradient Episodic Memory for Continual Learning
    Lopez-Paz, David
    Ranzato, Marc'Aurelio
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [3] Continual Learning with Scaled Gradient Projection
    Saha, Gobinda
    Roy, Kaushik
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9677 - 9685
  • [4] Class Gradient Projection For Continual Learning
    Chen, Cheng
    Zhang, Ji
    Song, Jingkuan
    Gao, Lianli
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5575 - 5583
  • [5] Task-Free Continual Learning via Online Discrepancy Distance Learning
    Ye, Fei
    Bors, Adrian G.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [6] Restricted orthogonal gradient projection for continual learning
    Yang, Zeyuan
    Yang, Zonghan
    Liu, Yichen
    Li, Peng
    Liu, Yang
    AI OPEN, 2023, 4 : 98 - 110
  • [7] Layerwise Optimization by Gradient Decomposition for Continual Learning
    Tang, Shixiang
    Chen, Dapeng
    Zhu, Jinguo
    Yu, Shijie
    Ouyang, Wanli
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9629 - 9638
  • [8] Learning where to learn: Gradient sparsity in meta and continual learning
    von Oswald, Johannes
    Zhao, Dominic
    Kobayashi, Seijin
    Schug, Simon
    Caccia, Massimo
    Zucchet, Nicolas
    Sacramento, Joao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [9] Gradient based sample selection for online continual learning
    Aljundi, Rahaf
    Lin, Min
    Goujaud, Baptiste
    Bengio, Yoshua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [10] Gradient Regularized Contrastive Learning for Continual Domain Adaptation
    Tang, Shixiang
    Su, Peng
    Chen, Dapeng
    Ouyang, Wanli
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2665 - 2673