Bypassing Stationary Points in Training Deep Learning Models

被引:0
|
作者
Jung, Jaeheun [1 ]
Lee, Donghun [2 ]
机构
[1] Korea Univ, Grad Sch Math, Seoul 02841, South Korea
[2] Korea Univ, Dept Math, Seoul 02841, South Korea
基金
新加坡国家研究基金会;
关键词
Training; Pipelines; Neural networks; Deep learning; Computational modeling; Vectors; Classification algorithms; Bypassing; gradient descent; neural network; stationary points;
D O I
10.1109/TNNLS.2024.3411020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gradient-descent-based optimizers are prone to slowdowns in training deep learning models, as stationary points are ubiquitous in the loss landscape of most neural networks. We present an intuitive concept of bypassing the stationary points and realize the concept into a novel method designed to actively rescue optimizers from slowdowns encountered in neural network training. The method, bypass pipeline, revitalizes the optimizer by extending the model space and later contracts the model back to its original space with function-preserving algebraic constraints. We implement the method into the bypass algorithm, verify that the algorithm shows theoretically expected behaviors of bypassing, and demonstrate its empirical benefit in regression and classification benchmarks. Bypass algorithm is highly practical, as it is computationally efficient and compatible with other improvements of first-order optimizers. In addition, bypassing for neural networks leads to new theoretical research such as model-specific bypassing and neural architecture search (NAS).
引用
收藏
页码:18859 / 18871
页数:13
相关论文
共 50 条
  • [1] Continuous Training and Deployment of Deep Learning Models
    Prapas, Ioannis
    Derakhshan, Behrouz
    Mahdiraji, Alireza Rezaei
    Markl, Volker
    Datenbank-Spektrum, 2021, 21 (03) : 203 - 212
  • [2] Towards Training Reproducible Deep Learning Models
    Chen, Boyuan
    Wen, Mingzhi
    Shi, Yong
    Lin, Dayi
    Rajbahadur, Gopi Krishnan
    Jiang, Zhen Ming
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2202 - 2214
  • [3] Tensor Normal Training for Deep Learning Models
    Ren, Yi
    Goldfarb, Donald
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [4] Bypassing Backdoor Detection Algorithms in Deep Learning
    Tan, Te Juin Lester
    Shokri, Reza
    2020 5TH IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2020), 2020, : 175 - 183
  • [5] Evolution and Role of Optimizers in Training Deep Learning Models
    XiaoHao Wen
    MengChu Zhou
    IEEE/CAA Journal of Automatica Sinica, 2024, 11 (10) : 2039 - 2042
  • [6] Evolution and Role of Optimizers in Training Deep Learning Models
    Wen, XiaoHao
    Zhou, MengChu
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (10) : 2039 - 2042
  • [7] Anderson Acceleration for Distributed Training of Deep Learning Models
    Pasini, Massimiliano Lupo
    Yin, Junqi
    Reshniak, Viktor
    Stoyanov, Miroslav K.
    SOUTHEASTCON 2022, 2022, : 289 - 295
  • [8] Pulsed Thermography Dataset for Training Deep Learning Models
    Wei, Ziang
    Osman, Ahmad
    Valeske, Bernd
    Maldague, Xavier
    APPLIED SCIENCES-BASEL, 2023, 13 (05):
  • [9] Distributed Training of Deep Learning Models: A Taxonomic Perspective
    Langer, Matthias
    He, Zhen
    Rahayu, Wenny
    Xue, Yanbo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (12) : 2802 - 2818
  • [10] Stationary Points for Parametric Stochastic Frontier Models
    Horrace, William C.
    Wright, Ian A.
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2020, 38 (03) : 516 - 526