Improving L-BFGS Initialization For Trust-Region Methods In Deep Learning

被引:14
|
作者
Rafati, Jacob [1 ]
Marcia, Roummel F. [2 ]
机构
[1] Univ Calif Merced, Elect Engn & Comp Sci, Merced, CA 95340 USA
[2] Univ Calif Merced, Appl Math, Merced, CA 95340 USA
基金
美国国家科学基金会;
关键词
Quasi-Newton Methods; L-BFGS; Trust-Region; Initialization; Deep Learning; LIMITED-MEMORY;
D O I
10.1109/ICMLA.2018.00081
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Generally, methods for solving the optimization problems in machine learning and in deep learning specifically are restricted to the class of first-order algorithms, like stochastic gradient descent (SGD). The major drawback of the SGD methods is that they have the undesirable effect of not escaping saddle-points. Furthermore, these methods require exhaustive trial-and-error to fine-tune many learning parameters. Using the second-order curvature information to find the search direction can help with more robust convergence for the non-convex optimization problem. However, computing the Hessian matrix for the large-scale problems is not computationally practical. Alternatively, quasi-Newton methods construct an approximate of Hessian matrix to build a quadratic model of the objective function. Quasi-Newton methods, like SGD, require only first-order gradient information, but they can result in superlinear convergence, which makes them attractive alternatives. The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive-definite Hessian approximations. Since the true Hessian matrix is not necessarily positive definite, an extra initialization condition is required to be introduced when constructing the L-BFGS matrices to avoid false negative curvature information. In this paper, we propose various choices for initialization methods of the L-BFGS matrices within a trust-region framework. We provide empirical results on the classification task of the MNIST digits dataset to compare the performance of the trust-region algorithm with different L-BFGS initialization methods.
引用
收藏
页码:501 / 508
页数:8
相关论文
共 50 条
  • [21] Globally convergent DC trust-region methods
    Hoai An Le Thi
    Van Ngai Huynh
    Tao Pham Dinh
    Vaz, A. Ismael F.
    Vicente, L. N.
    JOURNAL OF GLOBAL OPTIMIZATION, 2014, 59 (2-3) : 209 - 225
  • [22] A class of trust-region methods for parallel optimization
    Hough, PD
    Meza, JC
    SIAM JOURNAL ON OPTIMIZATION, 2002, 13 (01) : 264 - 282
  • [23] An Improved Trust-Region Method for Off-Policy Deep Reinforcement Learning
    Li, Hepeng
    Zhong, Xiangnan
    He, Haibo
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [24] Combining line search and trust-region methods for l1-minimization
    Esmaeili, Hamid
    Rostami, Majid
    Kimiaei, Morteza
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2018, 95 (10) : 1950 - 1972
  • [25] On Improving Trust-Region Variable Projection Algorithms for Separable Nonlinear Least Squares Learning
    Mizutani, Eiji
    Demmel, James
    2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 397 - 404
  • [26] Medical image fusion using Transfer Learning and L-BFGS optimization algorithm
    Jiang, Jionghui
    Feng, Xi'an
    Hu, Zhiwen
    Hu, Xiaodong
    Liu, Fen
    Huang, Hui
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2021, 31 (04) : 2003 - 2013
  • [27] Deep Neural Networks Training by Stochastic Quasi-Newton Trust-Region Methods
    Yousefi, Mahsa
    Martinez, Angeles
    ALGORITHMS, 2023, 16 (10)
  • [28] Trust-Region Methods for Nonconvex Sparse Recovery Optimization
    Adhikari, Lasith
    Marcia, Roummel F.
    Erway, Jennifer B.
    Plemmons, Robert J.
    PROCEEDINGS OF 2016 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA 2016), 2016, : 275 - 279
  • [29] ACCELERATED LINE-SEARCH AND TRUST-REGION METHODS
    Absil, P. -A.
    Gallivan, K. A.
    SIAM JOURNAL ON NUMERICAL ANALYSIS, 2009, 47 (02) : 997 - 1018
  • [30] BEHAVIOR OF TRUST-REGION METHODS IN FIML-ESTIMATION.
    Weihs, C.
    Calzolari, G.
    Panattoni, L.
    Computing (Vienna/New York), 1987, 38 (02): : 89 - 100