Bypassing Stationary Points in Training Deep Learning Models

被引：0

作者：

Jung, Jaeheun ^{[1
]}

Lee, Donghun ^{[2
]}

机构：

[1] Korea Univ, Grad Sch Math, Seoul 02841, South Korea

[2] Korea Univ, Dept Math, Seoul 02841, South Korea

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 12期

基金：

新加坡国家研究基金会;

关键词：

Training; Pipelines; Neural networks; Deep learning; Computational modeling; Vectors; Classification algorithms; Bypassing; gradient descent; neural network; stationary points;

D O I：

10.1109/TNNLS.2024.3411020

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Gradient-descent-based optimizers are prone to slowdowns in training deep learning models, as stationary points are ubiquitous in the loss landscape of most neural networks. We present an intuitive concept of bypassing the stationary points and realize the concept into a novel method designed to actively rescue optimizers from slowdowns encountered in neural network training. The method, bypass pipeline, revitalizes the optimizer by extending the model space and later contracts the model back to its original space with function-preserving algebraic constraints. We implement the method into the bypass algorithm, verify that the algorithm shows theoretically expected behaviors of bypassing, and demonstrate its empirical benefit in regression and classification benchmarks. Bypass algorithm is highly practical, as it is computationally efficient and compatible with other improvements of first-order optimizers. In addition, bypassing for neural networks leads to new theoretical research such as model-specific bypassing and neural architecture search (NAS).

引用

页码：18859 / 18871

页数：13

共 50 条

[41] The Value of Pre-Training for Deep Learning Acute Stroke Triaging Models
Yu, Yannan
Xie, Yuan
Gong, Enhao
Thamm, Thoralf
Ouyang, Jiahong
Christensen, Soren
Lansberg, Maarten
Albers, Gregory
Zaharchuk, Greg
STROKE, 2020, 51
[42] MPCA SGD-A Method for Distributed Training of Deep Learning Models on Spark
Langer, Matthias
Hall, Ashley
He, Zhen
Rahayu, Wenny
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (11) : 2540 - 2556
[43] DeepClone: Lightweight State Replication of Deep Learning Models for Data Parallel Training
Nicolae, Bogdan
Wozniak, Justin M.
Dorier, Matthieu
Cappello, Franck
2020 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2020), 2020, : 226 - 236
[44] Varuna: Scalable, Low-cost Training of Massive Deep Learning Models
Athlur, Sanjith
Saran, Nitika
Sivathanu, Muthian
Ramjee, Ramachandran
Kwatra, Nipun
PROCEEDINGS OF THE SEVENTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '22), 2022, : 472 - 487
[45] Training deep-learning segmentation models from severely limited data
Zhao, Yao
Rhee, Dong Joo
Cardenas, Carlos
Court, Laurence E.
Yang, Jinzhong
MEDICAL PHYSICS, 2021, 48 (04) : 1697 - 1706
[46] Industrial Object Detection: Leveraging Synthetic Data for Training Deep Learning Models
Ouarab, Sarah
Boutteau, Remi
Romeo, Katerine
Lecomte, Christele
Laignel, Aristid
Ragot, Nicolas
Duval, Fabrice
INDUSTRIAL ENGINEERING AND APPLICATIONS-EUROPE, ICIEA-EU 2024, 2024, 507 : 200 - 212
[47] Minimizing Energy Consumption of Deep Learning Models by Energy-Aware Training
Lazzaro, Dario
Cina, Antonio Emanuele
Pintor, Maura
Demontis, Ambra
Biggio, Battista
Roli, Fabio
Pelillo, Marcello
IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 515 - 526
[48] Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models
Prashanthi, S. K.
Kesanapalli, Sai Anuroop
Simmhan, Yogesh
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2022, 6 (03)
[49] Comparative analysis of training approaches for deep learning geographic atrophy segmentation models
Musial, Gwen
Zhang, Qinqin
Salehi, Ali
Herrera, Gissel
Shen, Mengxi
Gregori, Giovanni
Rosenfeld, Philip J.
Cheng, Yuxuan
Wang, Ruikang K.
INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
[50] Training Strategies for Radiology Deep Learning Models in Data-limited Scenarios
Candemir, Sema
Nguyen, Xuan, V
Folio, Les R.
Prevedello, Luciano M.
RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2021, 3 (06)

← 1 2 3 4 5 →