Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation

被引:1
|
作者
Picheny, Victor [1 ]
Dutordoir, Vincent [1 ]
Artemev, Artem [1 ]
Durrande, Nicolas [1 ]
机构
[1] PROWLER Io, 72 Hills Rd, Cambridge CB2 1LA, England
关键词
Learning rate; Gaussian process; Variational inference;
D O I
10.1007/978-3-030-67664-3_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many machine learning models require a training procedure based on running stochastic gradient descent. A key element for the efficiency of those algorithms is the choice of the learning rate schedule. While finding good learning rates schedules using Bayesian optimisation has been tackled by several authors, adapting it dynamically in a data-driven way is an open question. This is of high practical importance to users that need to train a single, expensive model. To tackle this problem, we introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation, that flexibly adjusts to abrupt changes of behaviours induced by new learning rate values. As illustrated, this model is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the schedule for a set of similar tasks (in a classical BO setup), as well as warm-starting it for a new task.
引用
收藏
页码:431 / 446
页数:16
相关论文
共 50 条
  • [1] Bayesian Distributed Stochastic Gradient Descent
    Teng, Michael
    Wood, Frank
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [2] Stochastic Gradient Descent as Approximate Bayesian Inference
    Mandt, Stephan
    Hoffman, Matthew D.
    Blei, David M.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [4] Adaptive Stochastic Gradient Descent Optimisation for Image Registration
    Stefan Klein
    Josien P. W. Pluim
    Marius Staring
    Max A. Viergever
    International Journal of Computer Vision, 2009, 81
  • [5] Adaptive Stochastic Gradient Descent Optimisation for Image Registration
    Klein, Stefan
    Pluim, Josien P. W.
    Staring, Marius
    Viergever, Max A.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2009, 81 (03) : 227 - 239
  • [6] BAYESIAN STOCHASTIC GRADIENT DESCENT FOR STOCHASTIC OPTIMIZATION WITH STREAMING INPUT DATA
    Liu, Tianyi
    Lin, Yifan
    Zhou, Enlu
    SIAM JOURNAL ON OPTIMIZATION, 2024, 34 (01) : 389 - 418
  • [7] Local Optimisation of Nystrom Samples Through Stochastic Gradient Descent
    Hutchings, Matthew
    Gauthier, Bertrand
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2022, PT I, 2023, 13810 : 123 - 140
  • [8] Preconditioned Stochastic Gradient Descent Optimisation for Monomodal Image Registration
    Klein, Stefan
    Staring, Marius
    Andersson, Patrik
    Pluim, Josien P. W.
    MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION (MICCAI 2011), PT II, 2011, 6892 : 549 - +
  • [9] Scaling up stochastic gradient descent for non-convex optimisation
    Mohamad, Saad
    Alamri, Hamad
    Bouchachia, Abdelhamid
    MACHINE LEARNING, 2022, 111 (11) : 4039 - 4079
  • [10] Scaling up stochastic gradient descent for non-convex optimisation
    Saad Mohamad
    Hamad Alamri
    Abdelhamid Bouchachia
    Machine Learning, 2022, 111 : 4039 - 4079