Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation

被引：1

作者：

Picheny, Victor ^{[1
]}

Dutordoir, Vincent ^{[1
]}

Artemev, Artem ^{[1
]}

Durrande, Nicolas ^{[1
]}

机构：

[1] PROWLER Io, 72 Hills Rd, Cambridge CB2 1LA, England

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III | 2021年 / 12459卷

关键词：

Learning rate; Gaussian process; Variational inference;

D O I：

10.1007/978-3-030-67664-3_26

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many machine learning models require a training procedure based on running stochastic gradient descent. A key element for the efficiency of those algorithms is the choice of the learning rate schedule. While finding good learning rates schedules using Bayesian optimisation has been tackled by several authors, adapting it dynamically in a data-driven way is an open question. This is of high practical importance to users that need to train a single, expensive model. To tackle this problem, we introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation, that flexibly adjusts to abrupt changes of behaviours induced by new learning rate values. As illustrated, this model is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the schedule for a set of similar tasks (in a classical BO setup), as well as warm-starting it for a new task.

引用

页码：431 / 446

页数：16

共 50 条

[31] Graph Drawing by Stochastic Gradient Descent
Zheng, Jonathan X.
Pawar, Samraat
Goodman, Dan F. M.
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2019, 25 (09) : 2738 - 2748
[32] On the discrepancy principle for stochastic gradient descent
Jahn, Tim
Jin, Bangti
INVERSE PROBLEMS, 2020, 36 (09)
[33] Nonparametric Budgeted Stochastic Gradient Descent
Trung Le
Vu Nguyen
Tu Dinh Nguyen
Dinh Phung
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 564 - 572
[34] Benign Underfitting of Stochastic Gradient Descent
Koren, Tomer
Livni, Roi
Mansour, Yishay
Sherman, Uri
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[35] The effective noise of stochastic gradient descent
Mignacco, Francesca
Urbani, Pierfrancesco
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2022, 2022 (08):
[36] On the regularizing property of stochastic gradient descent
Jin, Bangti
Lu, Xiliang
INVERSE PROBLEMS, 2019, 35 (01)
[37] A stochastic multiple gradient descent algorithm
Mercier, Quentin
Poirion, Fabrice
Desideri, Jean-Antoine
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2018, 271 (03) : 808 - 817
[38] Efficiency Ordering of Stochastic Gradient Descent
Hu, Jie
Doshi, Vishwaraj
Eun, Do Young
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[39] Stochastic Gradient Descent on Riemannian Manifolds
Bonnabel, Silvere
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2013, 58 (09) : 2217 - 2229
[40] Conjugate directions for stochastic gradient descent
Schraudolph, NN
Graepel, T
ARTIFICIAL NEURAL NETWORKS - ICANN 2002, 2002, 2415 : 1351 - 1356

← 1 2 3 4 5 →