Improving L-BFGS Initialization For Trust-Region Methods In Deep Learning

被引：14

作者：

Rafati, Jacob ^{[1
]}

Marcia, Roummel F. ^{[2
]}

机构：

[1] Univ Calif Merced, Elect Engn & Comp Sci, Merced, CA 95340 USA

[2] Univ Calif Merced, Appl Math, Merced, CA 95340 USA

来源：

2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA) | 2018年

基金：

美国国家科学基金会;

关键词：

Quasi-Newton Methods; L-BFGS; Trust-Region; Initialization; Deep Learning; LIMITED-MEMORY;

D O I：

10.1109/ICMLA.2018.00081

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Generally, methods for solving the optimization problems in machine learning and in deep learning specifically are restricted to the class of first-order algorithms, like stochastic gradient descent (SGD). The major drawback of the SGD methods is that they have the undesirable effect of not escaping saddle-points. Furthermore, these methods require exhaustive trial-and-error to fine-tune many learning parameters. Using the second-order curvature information to find the search direction can help with more robust convergence for the non-convex optimization problem. However, computing the Hessian matrix for the large-scale problems is not computationally practical. Alternatively, quasi-Newton methods construct an approximate of Hessian matrix to build a quadratic model of the objective function. Quasi-Newton methods, like SGD, require only first-order gradient information, but they can result in superlinear convergence, which makes them attractive alternatives. The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive-definite Hessian approximations. Since the true Hessian matrix is not necessarily positive definite, an extra initialization condition is required to be introduced when constructing the L-BFGS matrices to avoid false negative curvature information. In this paper, we propose various choices for initialization methods of the L-BFGS matrices within a trust-region framework. We provide empirical results on the classification task of the MNIST digits dataset to compare the performance of the trust-region algorithm with different L-BFGS initialization methods.

引用

页码：501 / 508

页数：8

共 50 条

[21] Globally convergent DC trust-region methods
Hoai An Le Thi
Van Ngai Huynh
Tao Pham Dinh
Vaz, A. Ismael F.
Vicente, L. N.
JOURNAL OF GLOBAL OPTIMIZATION, 2014, 59 (2-3) : 209 - 225
[22] A class of trust-region methods for parallel optimization
Hough, PD
Meza, JC
SIAM JOURNAL ON OPTIMIZATION, 2002, 13 (01) : 264 - 282
[23] An Improved Trust-Region Method for Off-Policy Deep Reinforcement Learning
Li, Hepeng
Zhong, Xiangnan
He, Haibo
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[24] Combining line search and trust-region methods for l1-minimization
Esmaeili, Hamid
Rostami, Majid
Kimiaei, Morteza
INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2018, 95 (10) : 1950 - 1972
[25] On Improving Trust-Region Variable Projection Algorithms for Separable Nonlinear Least Squares Learning
Mizutani, Eiji
Demmel, James
2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 397 - 404
[26] Medical image fusion using Transfer Learning and L-BFGS optimization algorithm
Jiang, Jionghui
Feng, Xi'an
Hu, Zhiwen
Hu, Xiaodong
Liu, Fen
Huang, Hui
INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2021, 31 (04) : 2003 - 2013
[27] Deep Neural Networks Training by Stochastic Quasi-Newton Trust-Region Methods
Yousefi, Mahsa
Martinez, Angeles
ALGORITHMS, 2023, 16 (10)
[28] Trust-Region Methods for Nonconvex Sparse Recovery Optimization
Adhikari, Lasith
Marcia, Roummel F.
Erway, Jennifer B.
Plemmons, Robert J.
PROCEEDINGS OF 2016 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA 2016), 2016, : 275 - 279
[29] ACCELERATED LINE-SEARCH AND TRUST-REGION METHODS
Absil, P. -A.
Gallivan, K. A.
SIAM JOURNAL ON NUMERICAL ANALYSIS, 2009, 47 (02) : 997 - 1018
[30] BEHAVIOR OF TRUST-REGION METHODS IN FIML-ESTIMATION.
Weihs, C.
Calzolari, G.
Panattoni, L.
Computing (Vienna/New York), 1987, 38 (02): : 89 - 100

← 1 2 3 4 5 →