Improving L-BFGS Initialization For Trust-Region Methods In Deep Learning

被引:14
|
作者
Rafati, Jacob [1 ]
Marcia, Roummel F. [2 ]
机构
[1] Univ Calif Merced, Elect Engn & Comp Sci, Merced, CA 95340 USA
[2] Univ Calif Merced, Appl Math, Merced, CA 95340 USA
基金
美国国家科学基金会;
关键词
Quasi-Newton Methods; L-BFGS; Trust-Region; Initialization; Deep Learning; LIMITED-MEMORY;
D O I
10.1109/ICMLA.2018.00081
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Generally, methods for solving the optimization problems in machine learning and in deep learning specifically are restricted to the class of first-order algorithms, like stochastic gradient descent (SGD). The major drawback of the SGD methods is that they have the undesirable effect of not escaping saddle-points. Furthermore, these methods require exhaustive trial-and-error to fine-tune many learning parameters. Using the second-order curvature information to find the search direction can help with more robust convergence for the non-convex optimization problem. However, computing the Hessian matrix for the large-scale problems is not computationally practical. Alternatively, quasi-Newton methods construct an approximate of Hessian matrix to build a quadratic model of the objective function. Quasi-Newton methods, like SGD, require only first-order gradient information, but they can result in superlinear convergence, which makes them attractive alternatives. The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive-definite Hessian approximations. Since the true Hessian matrix is not necessarily positive definite, an extra initialization condition is required to be introduced when constructing the L-BFGS matrices to avoid false negative curvature information. In this paper, we propose various choices for initialization methods of the L-BFGS matrices within a trust-region framework. We provide empirical results on the classification task of the MNIST digits dataset to compare the performance of the trust-region algorithm with different L-BFGS initialization methods.
引用
收藏
页码:501 / 508
页数:8
相关论文
共 50 条
  • [1] Algorithm 943: MSS: MATLAB Software for L-BFGS Trust-Region Subproblems for Large-Scale Optimization
    Erway, Jennifer B.
    Marcia, Roummel F.
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2014, 40 (04):
  • [2] A BFGS trust-region method for nonlinear equations
    Yuan, Gonglin
    Wei, Zengxin
    Lu, Xiwen
    COMPUTING, 2011, 92 (04) : 317 - 333
  • [3] A BFGS trust-region method for nonlinear equations
    Gonglin Yuan
    Zengxin Wei
    Xiwen Lu
    Computing, 2011, 92 : 317 - 333
  • [4] BFGS trust-region method for symmetric nonlinear equations
    Yuan, Gonglin
    Lu, Xiwen
    Wei, Zengxin
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2009, 230 (01) : 44 - 58
  • [5] A Class of Methods Combining L-BFGS and Truncated Newton
    Frimannslund, Lennart
    Steihaug, Trond
    COMPUTER AND INFORMATION SCIENCES II, 2012, : 565 - 570
  • [6] A Progressive Batching L-BFGS Method for Machine Learning
    Bollapragada, Raghu
    Mudigere, Dheevatsa
    Nocedal, Jorge
    Shi, Hao-Jun Michael
    Tang, Ping Tak Peter
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [7] STOCHASTIC TRUST-REGION METHODS WITH TRUST-REGION RADIUS DEPENDING ON PROBABILISTIC MODELS
    Wang, Xiaoyu
    Yuan, Ya-xiang
    JOURNAL OF COMPUTATIONAL MATHEMATICS, 2022, 40 (02): : 295 - 336
  • [8] Trust-region learning for ICA
    Choi, H
    Kim, S
    Choi, S
    2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 41 - 46
  • [9] A BFGS trust-region method with a new nonmonotone technique for nonlinear equations
    Wang, Hua
    Gu, Chao
    Pu, Dingguo
    OPTIMIZATION, 2015, 64 (04) : 981 - 992
  • [10] A Multi-Batch L-BFGS Method for Machine Learning
    Berahas, Albert S.
    Nocedal, Jorge
    Takac, Martin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29