Improving L-BFGS Initialization For Trust-Region Methods In Deep Learning

被引：14

作者：

Rafati, Jacob ^{[1
]}

Marcia, Roummel F. ^{[2
]}

机构：

[1] Univ Calif Merced, Elect Engn & Comp Sci, Merced, CA 95340 USA

[2] Univ Calif Merced, Appl Math, Merced, CA 95340 USA

来源：

2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA) | 2018年

基金：

美国国家科学基金会;

关键词：

Quasi-Newton Methods; L-BFGS; Trust-Region; Initialization; Deep Learning; LIMITED-MEMORY;

D O I：

10.1109/ICMLA.2018.00081

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Generally, methods for solving the optimization problems in machine learning and in deep learning specifically are restricted to the class of first-order algorithms, like stochastic gradient descent (SGD). The major drawback of the SGD methods is that they have the undesirable effect of not escaping saddle-points. Furthermore, these methods require exhaustive trial-and-error to fine-tune many learning parameters. Using the second-order curvature information to find the search direction can help with more robust convergence for the non-convex optimization problem. However, computing the Hessian matrix for the large-scale problems is not computationally practical. Alternatively, quasi-Newton methods construct an approximate of Hessian matrix to build a quadratic model of the objective function. Quasi-Newton methods, like SGD, require only first-order gradient information, but they can result in superlinear convergence, which makes them attractive alternatives. The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive-definite Hessian approximations. Since the true Hessian matrix is not necessarily positive definite, an extra initialization condition is required to be introduced when constructing the L-BFGS matrices to avoid false negative curvature information. In this paper, we propose various choices for initialization methods of the L-BFGS matrices within a trust-region framework. We provide empirical results on the classification task of the MNIST digits dataset to compare the performance of the trust-region algorithm with different L-BFGS initialization methods.

引用

页码：501 / 508

页数：8

共 50 条

[1] Algorithm 943: MSS: MATLAB Software for L-BFGS Trust-Region Subproblems for Large-Scale Optimization
Erway, Jennifer B.
Marcia, Roummel F.
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2014, 40 (04):
[2] A BFGS trust-region method for nonlinear equations
Yuan, Gonglin
Wei, Zengxin
Lu, Xiwen
COMPUTING, 2011, 92 (04) : 317 - 333
[3] A BFGS trust-region method for nonlinear equations
Gonglin Yuan
Zengxin Wei
Xiwen Lu
Computing, 2011, 92 : 317 - 333
[4] BFGS trust-region method for symmetric nonlinear equations
Yuan, Gonglin
Lu, Xiwen
Wei, Zengxin
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2009, 230 (01) : 44 - 58
[5] A Class of Methods Combining L-BFGS and Truncated Newton
Frimannslund, Lennart
Steihaug, Trond
COMPUTER AND INFORMATION SCIENCES II, 2012, : 565 - 570
[6] A Progressive Batching L-BFGS Method for Machine Learning
Bollapragada, Raghu
Mudigere, Dheevatsa
Nocedal, Jorge
Shi, Hao-Jun Michael
Tang, Ping Tak Peter
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[7] STOCHASTIC TRUST-REGION METHODS WITH TRUST-REGION RADIUS DEPENDING ON PROBABILISTIC MODELS
Wang, Xiaoyu
Yuan, Ya-xiang
JOURNAL OF COMPUTATIONAL MATHEMATICS, 2022, 40 (02): : 295 - 336
[8] Trust-region learning for ICA
Choi, H
Kim, S
Choi, S
2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 41 - 46
[9] A BFGS trust-region method with a new nonmonotone technique for nonlinear equations
Wang, Hua
Gu, Chao
Pu, Dingguo
OPTIMIZATION, 2015, 64 (04) : 981 - 992
[10] A Multi-Batch L-BFGS Method for Machine Learning
Berahas, Albert S.
Nocedal, Jorge
Takac, Martin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29

← 1 2 3 4 5 →