Bypassing Stationary Points in Training Deep Learning Models

被引:0
|
作者
Jung, Jaeheun [1 ]
Lee, Donghun [2 ]
机构
[1] Korea Univ, Grad Sch Math, Seoul 02841, South Korea
[2] Korea Univ, Dept Math, Seoul 02841, South Korea
基金
新加坡国家研究基金会;
关键词
Training; Pipelines; Neural networks; Deep learning; Computational modeling; Vectors; Classification algorithms; Bypassing; gradient descent; neural network; stationary points;
D O I
10.1109/TNNLS.2024.3411020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gradient-descent-based optimizers are prone to slowdowns in training deep learning models, as stationary points are ubiquitous in the loss landscape of most neural networks. We present an intuitive concept of bypassing the stationary points and realize the concept into a novel method designed to actively rescue optimizers from slowdowns encountered in neural network training. The method, bypass pipeline, revitalizes the optimizer by extending the model space and later contracts the model back to its original space with function-preserving algebraic constraints. We implement the method into the bypass algorithm, verify that the algorithm shows theoretically expected behaviors of bypassing, and demonstrate its empirical benefit in regression and classification benchmarks. Bypass algorithm is highly practical, as it is computationally efficient and compatible with other improvements of first-order optimizers. In addition, bypassing for neural networks leads to new theoretical research such as model-specific bypassing and neural architecture search (NAS).
引用
收藏
页码:18859 / 18871
页数:13
相关论文
共 50 条
  • [21] Analysis of Training Deep Learning Models for PCB Defect Detection
    Park, Joon-Hyung
    Kim, Yeong-Seok
    Seo, Hwi
    Cho, Yeong-Jun
    SENSORS, 2023, 23 (05)
  • [22] On Training Deep-Learning Models for Removing Airborne-Particle Points From SPAD LiDAR Multiecho Point Clouds
    Sang, Tzu-Hsien
    Lin, Yu-Chen
    Hsiao, Yu-Chan
    IEEE SENSORS LETTERS, 2023, 7 (09)
  • [23] Bypassing Web Application Firewalls Using Deep Reinforcement Learning
    Hemmati, Mojtaba
    Hadavi, Mohammad Ali
    ISECURE-ISC INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2022, 14 (02): : 213 - 227
  • [24] Distributed Framework for Accelerating Training of Deep Learning Models through Prioritization
    Zhou, Tian
    Gao, Lixin
    2021 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING, IC2E 2021, 2021, : 201 - 209
  • [25] Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models
    Teng, Yunfei
    Gao, Wenbo
    Chalus, Francois
    Choromanska, Anna
    Goldfarb, Donald
    Weller, Adrian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [26] Phase-Change Memory Models for Deep Learning Training and Inference
    Nandakumar, S. R.
    Boybat, Irem
    Joshi, Vinay
    Piveteau, Christophe
    Le Gallo, Manuel
    Rajendran, Bipin
    Sebastian, Abu
    Eleftheriou, Evangelos
    2019 26TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2019, : 727 - 730
  • [27] Exploration of the Influence on Training Deep Learning Models by Watermarked Image Dataset
    Liu, Shiqin
    Feng, Shiyuan
    Wu, Jinxia
    Ren, Wei
    Wang, Weiqi
    Zheng, Wenwen
    19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 421 - 428
  • [28] Standardizing and Centralizing Datasets for Efficient Training of Agricultural Deep Learning Models
    Joshi, Amogh
    Guevara, Dario
    Earles, Mason
    PLANT PHENOMICS, 2023, 5
  • [29] Training confounder-free deep learning models for medical applications
    Zhao, Qingyu
    Adeli, Ehsan
    Pohl, Kilian M.
    NATURE COMMUNICATIONS, 2020, 11 (01)
  • [30] Efficient Training of Deep Learning Models Through Improved Adaptive Sampling
    Avalos-Lopez, Jorge Ivan
    Rojas-Dominguez, Alfonso
    Ornelas-Rodriguez, Manuel
    Carpio, Martin
    Valdez, S. Ivvan
    PATTERN RECOGNITION (MCPR 2021), 2021, 12725 : 141 - 152