Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation

被引:1
|
作者
Picheny, Victor [1 ]
Dutordoir, Vincent [1 ]
Artemev, Artem [1 ]
Durrande, Nicolas [1 ]
机构
[1] PROWLER Io, 72 Hills Rd, Cambridge CB2 1LA, England
关键词
Learning rate; Gaussian process; Variational inference;
D O I
10.1007/978-3-030-67664-3_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many machine learning models require a training procedure based on running stochastic gradient descent. A key element for the efficiency of those algorithms is the choice of the learning rate schedule. While finding good learning rates schedules using Bayesian optimisation has been tackled by several authors, adapting it dynamically in a data-driven way is an open question. This is of high practical importance to users that need to train a single, expensive model. To tackle this problem, we introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation, that flexibly adjusts to abrupt changes of behaviours induced by new learning rate values. As illustrated, this model is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the schedule for a set of similar tasks (in a classical BO setup), as well as warm-starting it for a new task.
引用
收藏
页码:431 / 446
页数:16
相关论文
共 50 条
  • [41] STOCHASTIC MODIFIED FLOWS FOR RIEMANNIAN STOCHASTIC GRADIENT DESCENT
    Gess, Benjamin
    Kassing, Sebastian
    Rana, Nimit
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2024, 62 (06) : 3288 - 3314
  • [42] A Stochastic Gradient Descent Approach for Stochastic Optimal Control
    Archibald, Richard
    Bao, Feng
    Yong, Jiongmin
    EAST ASIAN JOURNAL ON APPLIED MATHEMATICS, 2020, 10 (04) : 635 - 658
  • [43] Stochastic modified equations for the asynchronous stochastic gradient descent
    An, Jing
    Lu, Jianfeng
    Ying, Lexing
    INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2020, 9 (04) : 851 - 873
  • [44] Energy-Aware Automatic Tuning of Many-Core Platform via Gradient Descent
    Akiki, Samer
    Yang, Zhiliu
    Liu, Chen
    Tang, Jie
    Liu, Shaoshan
    2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 1199 - 1203
  • [45] Learning to Race through Coordinate Descent Bayesian Optimisation
    Oliveira, Rafael
    Rocha, Fernando H. M.
    Ott, Lionel
    Guizilini, Vitor
    Ramos, Fabio
    Grassi, Valdir, Jr.
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 6431 - 6438
  • [46] On-line automatic controller tuning of a multivariable grinding mill circuit using Bayesian optimisation
    van Niekerk, J. A.
    le Roux, J. D.
    Craig, I. K.
    JOURNAL OF PROCESS CONTROL, 2023, 128
  • [47] An automatic learning rate decay strategy for stochastic gradient descent optimization methods in neural networks
    Wang, Kang
    Dou, Yong
    Sun, Tao
    Qiao, Peng
    Wen, Dong
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (10) : 7334 - 7355
  • [48] Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition
    Cui, Xiaodong
    Picheny, Michael
    INTERSPEECH 2019, 2019, : 1581 - 1585
  • [49] On the convergence and improvement of stochastic normalized gradient descent
    Shen-Yi ZHAO
    Yin-Peng XIE
    Wu-Jun LI
    ScienceChina(InformationSciences), 2021, 64 (03) : 105 - 117
  • [50] STOCHASTIC GRADIENT DESCENT WITH FINITE SAMPLES SIZES
    Yuan, Kun
    Ying, Bicheng
    Vlaski, Stefan
    Sayed, Ali H.
    2016 IEEE 26TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2016,