An Efficient Fisher Matrix Approximation Method for Large-Scale Neural Network Optimization

被引:2
|
作者
Yang, Minghan [1 ]
Xu, Dong [2 ]
Cui, Qiwen [3 ]
Wen, Zaiwen [4 ,5 ]
Xu, Pengxiang [6 ]
机构
[1] DAMO Acad, Alibaba Grp, Hangzhou 310000, Peoples R China
[2] Peking Univ, Beijing Int Ctr Math Res, Beijing 100871, Peoples R China
[3] Univ Washington, Sch Comp Sci & Engn, Seattle, WA 98195 USA
[4] Peking Univ, Coll Engn, Beijing Int Ctr Math Res, Beijing 100871, Peoples R China
[5] Peking Univ, Ctr Machine Learning Res, Beijing 100871, Peoples R China
[6] Peng Cheng Lab, Shenzhen 518066, Guangdong, Peoples R China
关键词
Empirical risk minimization problems; stochastic optimization; natural gradient method; convergence; QUASI-NEWTON METHOD;
D O I
10.1109/TPAMI.2022.3213654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although the shapes of the parameters are not crucial for designing first-order optimization methods in large scale empirical risk minimization problems, they have important impact on the size of the matrix to be inverted when developing second-order type methods. In this article, we propose an efficient and novel second-order method based on the parameters in the real matrix space R-mxn and a matrix-product approximate Fisher matrix (MatFisher) by using the products of gradients. The size of the matrix to be inverted is much smaller than that of the Fisher information matrix in the real vector space R-d. Moreover, by utilizing the matrix delayed update and the block diagonal approximation techniques, the computational cost can be controlled and is comparable with first-order methods. A global convergence and a superlinear local convergence analysis are established under mild conditions. Numerical results on image classification with ResNet50, quantum chemistry modeling with SchNet, and data-driven partial differential equations solution with PINN illustrate that our method is quite competitive to the state-of-the-art methods.
引用
收藏
页码:5391 / 5403
页数:13
相关论文
共 50 条
  • [1] Parallel Multipoint Approximation Method for Large-Scale Optimization Problems
    Gergel, Victor P.
    Barkalov, Konstantin A.
    Kozinov, Evgeny A.
    Toropov, Vassili V.
    PARALLEL COMPUTATIONAL TECHNOLOGIES, PCT 2018, 2018, 910 : 174 - 185
  • [2] Large-scale neural network method for brain computing
    Miyakawa, N
    Ichikawa, M
    Matsumoto, G
    APPLIED MATHEMATICS AND COMPUTATION, 2000, 111 (2-3) : 203 - 208
  • [3] An efficient algorithm for Kriging approximation and optimization with large-scale sampling data
    Sakata, S
    Ashida, F
    Zako, M
    COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2004, 193 (3-5) : 385 - 404
  • [4] Dynamic programming neural network for large-scale optimization problems
    Hou, Zengguang
    Wu, Cangpu
    Zidonghua Xuebao/Acta Automatica Sinica, 1999, 25 (01): : 45 - 51
  • [5] A neural network for hierarchical optimization of nonlinear large-scale systems
    Hou, ZG
    Wu, CP
    Bao, P
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 1998, 29 (02) : 159 - 166
  • [6] A hierarchical optimization neural network for large-scale dynamic systems
    Hou, ZG
    AUTOMATICA, 2001, 37 (12) : 1931 - 1940
  • [7] An optimization method of large-scale IP traffic matrix estimation
    Jiang, Dingde
    Wang, Xingwei
    Guo, Lei
    AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS, 2010, 64 (07) : 685 - 689
  • [8] Large-Scale Binary Matrix Optimization for Multimicrogrids Network Structure Design
    Li, Wenhua
    Wang, Rui
    Huang, Shengjun
    Zhang, Tao
    Wang, Ling
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (03): : 1633 - 1644
  • [9] Toward Efficient Retraining: A Large-Scale Approximate Neural Network Framework With Cross-Layer Optimization
    Yu, Tianyang
    Wu, Bi
    Chen, Ke
    Yan, Chenggang
    Liu, Weiqiang
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2024, 32 (06) : 1004 - 1017
  • [10] ON LARGE-SCALE NONLINEAR NETWORK OPTIMIZATION
    TOINT, PL
    TUYTTENS, D
    MATHEMATICAL PROGRAMMING, 1990, 48 (01) : 125 - 159