Bypassing Stationary Points in Training Deep Learning Models

被引:0
|
作者
Jung, Jaeheun [1 ]
Lee, Donghun [2 ]
机构
[1] Korea Univ, Grad Sch Math, Seoul 02841, South Korea
[2] Korea Univ, Dept Math, Seoul 02841, South Korea
基金
新加坡国家研究基金会;
关键词
Training; Pipelines; Neural networks; Deep learning; Computational modeling; Vectors; Classification algorithms; Bypassing; gradient descent; neural network; stationary points;
D O I
10.1109/TNNLS.2024.3411020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gradient-descent-based optimizers are prone to slowdowns in training deep learning models, as stationary points are ubiquitous in the loss landscape of most neural networks. We present an intuitive concept of bypassing the stationary points and realize the concept into a novel method designed to actively rescue optimizers from slowdowns encountered in neural network training. The method, bypass pipeline, revitalizes the optimizer by extending the model space and later contracts the model back to its original space with function-preserving algebraic constraints. We implement the method into the bypass algorithm, verify that the algorithm shows theoretically expected behaviors of bypassing, and demonstrate its empirical benefit in regression and classification benchmarks. Bypass algorithm is highly practical, as it is computationally efficient and compatible with other improvements of first-order optimizers. In addition, bypassing for neural networks leads to new theoretical research such as model-specific bypassing and neural architecture search (NAS).
引用
收藏
页码:18859 / 18871
页数:13
相关论文
共 50 条
  • [41] The Value of Pre-Training for Deep Learning Acute Stroke Triaging Models
    Yu, Yannan
    Xie, Yuan
    Gong, Enhao
    Thamm, Thoralf
    Ouyang, Jiahong
    Christensen, Soren
    Lansberg, Maarten
    Albers, Gregory
    Zaharchuk, Greg
    STROKE, 2020, 51
  • [42] MPCA SGD-A Method for Distributed Training of Deep Learning Models on Spark
    Langer, Matthias
    Hall, Ashley
    He, Zhen
    Rahayu, Wenny
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (11) : 2540 - 2556
  • [43] DeepClone: Lightweight State Replication of Deep Learning Models for Data Parallel Training
    Nicolae, Bogdan
    Wozniak, Justin M.
    Dorier, Matthieu
    Cappello, Franck
    2020 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2020), 2020, : 226 - 236
  • [44] Varuna: Scalable, Low-cost Training of Massive Deep Learning Models
    Athlur, Sanjith
    Saran, Nitika
    Sivathanu, Muthian
    Ramjee, Ramachandran
    Kwatra, Nipun
    PROCEEDINGS OF THE SEVENTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '22), 2022, : 472 - 487
  • [45] Training deep-learning segmentation models from severely limited data
    Zhao, Yao
    Rhee, Dong Joo
    Cardenas, Carlos
    Court, Laurence E.
    Yang, Jinzhong
    MEDICAL PHYSICS, 2021, 48 (04) : 1697 - 1706
  • [46] Industrial Object Detection: Leveraging Synthetic Data for Training Deep Learning Models
    Ouarab, Sarah
    Boutteau, Remi
    Romeo, Katerine
    Lecomte, Christele
    Laignel, Aristid
    Ragot, Nicolas
    Duval, Fabrice
    INDUSTRIAL ENGINEERING AND APPLICATIONS-EUROPE, ICIEA-EU 2024, 2024, 507 : 200 - 212
  • [47] Minimizing Energy Consumption of Deep Learning Models by Energy-Aware Training
    Lazzaro, Dario
    Cina, Antonio Emanuele
    Pintor, Maura
    Demontis, Ambra
    Biggio, Battista
    Roli, Fabio
    Pelillo, Marcello
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 515 - 526
  • [48] Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models
    Prashanthi, S. K.
    Kesanapalli, Sai Anuroop
    Simmhan, Yogesh
    PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2022, 6 (03)
  • [49] Comparative analysis of training approaches for deep learning geographic atrophy segmentation models
    Musial, Gwen
    Zhang, Qinqin
    Salehi, Ali
    Herrera, Gissel
    Shen, Mengxi
    Gregori, Giovanni
    Rosenfeld, Philip J.
    Cheng, Yuxuan
    Wang, Ruikang K.
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
  • [50] Training Strategies for Radiology Deep Learning Models in Data-limited Scenarios
    Candemir, Sema
    Nguyen, Xuan, V
    Folio, Les R.
    Prevedello, Luciano M.
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2021, 3 (06)