Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation

被引：1

作者：

Picheny, Victor ^{[1
]}

Dutordoir, Vincent ^{[1
]}

Artemev, Artem ^{[1
]}

Durrande, Nicolas ^{[1
]}

机构：

[1] PROWLER Io, 72 Hills Rd, Cambridge CB2 1LA, England

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III | 2021年 / 12459卷

关键词：

Learning rate; Gaussian process; Variational inference;

D O I：

10.1007/978-3-030-67664-3_26

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many machine learning models require a training procedure based on running stochastic gradient descent. A key element for the efficiency of those algorithms is the choice of the learning rate schedule. While finding good learning rates schedules using Bayesian optimisation has been tackled by several authors, adapting it dynamically in a data-driven way is an open question. This is of high practical importance to users that need to train a single, expensive model. To tackle this problem, we introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation, that flexibly adjusts to abrupt changes of behaviours induced by new learning rate values. As illustrated, this model is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the schedule for a set of similar tasks (in a classical BO setup), as well as warm-starting it for a new task.

引用

页码：431 / 446

页数：16

共 50 条

[21] Stochastic gradient descent tricks
Bottou, Léon
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, 7700 LECTURE NO : 421 - 436
[22] Byzantine Stochastic Gradient Descent
Alistarh, Dan
Allen-Zhu, Zeyuan
Li, Jerry
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[23] Exploration of the application of optimisation algorithm using stochastic gradient descent method in satellite resource allocation
Zhao D.
Xiong W.
Shi J.
Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
[24] Convergence of Stochastic Gradient Descent for PCA
Shamir, Ohad
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[25] Stochastic Gradient Descent in Continuous Time
Sirignano, Justin
Spiliopoulos, Konstantinos
SIAM JOURNAL ON FINANCIAL MATHEMATICS, 2017, 8 (01): : 933 - 961
[26] On the Hyperparameters in Stochastic Gradient Descent with Momentum
Shi, Bin
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[27] On the Generalization of Stochastic Gradient Descent with Momentum
Ramezani-Kebrya, Ali
Antonakopoulos, Kimon
Cevher, Volkan
Khisti, Ashish
Liang, Ben
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 56
[28] On the different regimes of stochastic gradient descent
Sclocchi, Antonio
Wyart, Matthieu
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 121 (09)
[29] BACKPROPAGATION AND STOCHASTIC GRADIENT DESCENT METHOD
AMARI, S
NEUROCOMPUTING, 1993, 5 (4-5) : 185 - 196
[30] Randomized Stochastic Gradient Descent Ascent
Sebbouh, Othmane
Cuturi, Marco
Peyre, Gabriel
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151

← 1 2 3 4 5 →