Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation

被引:1
|
作者
Picheny, Victor [1 ]
Dutordoir, Vincent [1 ]
Artemev, Artem [1 ]
Durrande, Nicolas [1 ]
机构
[1] PROWLER Io, 72 Hills Rd, Cambridge CB2 1LA, England
关键词
Learning rate; Gaussian process; Variational inference;
D O I
10.1007/978-3-030-67664-3_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many machine learning models require a training procedure based on running stochastic gradient descent. A key element for the efficiency of those algorithms is the choice of the learning rate schedule. While finding good learning rates schedules using Bayesian optimisation has been tackled by several authors, adapting it dynamically in a data-driven way is an open question. This is of high practical importance to users that need to train a single, expensive model. To tackle this problem, we introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation, that flexibly adjusts to abrupt changes of behaviours induced by new learning rate values. As illustrated, this model is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the schedule for a set of similar tasks (in a classical BO setup), as well as warm-starting it for a new task.
引用
收藏
页码:431 / 446
页数:16
相关论文
共 50 条
  • [21] Stochastic gradient descent tricks
    Bottou, Léon
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, 7700 LECTURE NO : 421 - 436
  • [22] Byzantine Stochastic Gradient Descent
    Alistarh, Dan
    Allen-Zhu, Zeyuan
    Li, Jerry
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [23] Exploration of the application of optimisation algorithm using stochastic gradient descent method in satellite resource allocation
    Zhao D.
    Xiong W.
    Shi J.
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [24] Convergence of Stochastic Gradient Descent for PCA
    Shamir, Ohad
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [25] Stochastic Gradient Descent in Continuous Time
    Sirignano, Justin
    Spiliopoulos, Konstantinos
    SIAM JOURNAL ON FINANCIAL MATHEMATICS, 2017, 8 (01): : 933 - 961
  • [26] On the Hyperparameters in Stochastic Gradient Descent with Momentum
    Shi, Bin
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [27] On the Generalization of Stochastic Gradient Descent with Momentum
    Ramezani-Kebrya, Ali
    Antonakopoulos, Kimon
    Cevher, Volkan
    Khisti, Ashish
    Liang, Ben
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 56
  • [28] On the different regimes of stochastic gradient descent
    Sclocchi, Antonio
    Wyart, Matthieu
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 121 (09)
  • [29] BACKPROPAGATION AND STOCHASTIC GRADIENT DESCENT METHOD
    AMARI, S
    NEUROCOMPUTING, 1993, 5 (4-5) : 185 - 196
  • [30] Randomized Stochastic Gradient Descent Ascent
    Sebbouh, Othmane
    Cuturi, Marco
    Peyre, Gabriel
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151