An Efficient Fisher Matrix Approximation Method for Large-Scale Neural Network Optimization

被引:2
|
作者
Yang, Minghan [1 ]
Xu, Dong [2 ]
Cui, Qiwen [3 ]
Wen, Zaiwen [4 ,5 ]
Xu, Pengxiang [6 ]
机构
[1] DAMO Acad, Alibaba Grp, Hangzhou 310000, Peoples R China
[2] Peking Univ, Beijing Int Ctr Math Res, Beijing 100871, Peoples R China
[3] Univ Washington, Sch Comp Sci & Engn, Seattle, WA 98195 USA
[4] Peking Univ, Coll Engn, Beijing Int Ctr Math Res, Beijing 100871, Peoples R China
[5] Peking Univ, Ctr Machine Learning Res, Beijing 100871, Peoples R China
[6] Peng Cheng Lab, Shenzhen 518066, Guangdong, Peoples R China
关键词
Empirical risk minimization problems; stochastic optimization; natural gradient method; convergence; QUASI-NEWTON METHOD;
D O I
10.1109/TPAMI.2022.3213654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although the shapes of the parameters are not crucial for designing first-order optimization methods in large scale empirical risk minimization problems, they have important impact on the size of the matrix to be inverted when developing second-order type methods. In this article, we propose an efficient and novel second-order method based on the parameters in the real matrix space R-mxn and a matrix-product approximate Fisher matrix (MatFisher) by using the products of gradients. The size of the matrix to be inverted is much smaller than that of the Fisher information matrix in the real vector space R-d. Moreover, by utilizing the matrix delayed update and the block diagonal approximation techniques, the computational cost can be controlled and is comparable with first-order methods. A global convergence and a superlinear local convergence analysis are established under mild conditions. Numerical results on image classification with ResNet50, quantum chemistry modeling with SchNet, and data-driven partial differential equations solution with PINN illustrate that our method is quite competitive to the state-of-the-art methods.
引用
收藏
页码:5391 / 5403
页数:13
相关论文
共 50 条
  • [21] A Large-Scale Optimization Method Using a Sparse Approximation of the Hessian for Magnetic Resonance Fingerprinting
    Wuebbeler, Gerd
    Elster, Clemens
    SIAM JOURNAL ON IMAGING SCIENCES, 2017, 10 (03): : 979 - 1004
  • [22] TESTING OF A LARGE-SCALE NETWORK OPTIMIZATION PROGRAM
    MULVEY, JM
    MATHEMATICAL PROGRAMMING, 1978, 15 (03) : 291 - 314
  • [23] Large-Scale District Heating Network Optimization
    Dorfner, Johannes
    Hamacher, Thomas
    IEEE TRANSACTIONS ON SMART GRID, 2014, 5 (04) : 1884 - 1891
  • [24] An efficient conjugate direction method with orthogonalization for large-scale quadratic optimization problems
    Boudinov, Edouard R.
    Manevich, Arkadiy I.
    OPTIMIZATION METHODS & SOFTWARE, 2007, 22 (02): : 309 - 328
  • [25] An efficient gradient method with approximate optimal stepsize for large-scale unconstrained optimization
    Liu, Zexian
    Liu, Hongwei
    NUMERICAL ALGORITHMS, 2018, 78 (01) : 21 - 39
  • [26] Gene Targeting Differential Evolution: A Simple and Efficient Method for Large-Scale Optimization
    Wang, Zi-Jia
    Jian, Jun-Rong
    Zhan, Zhi-Hui
    Li, Yun
    Kwong, Sam
    Zhang, Jun
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2023, 27 (04) : 964 - 979
  • [27] An efficient gradient method with approximate optimal stepsize for large-scale unconstrained optimization
    Zexian Liu
    Hongwei Liu
    Numerical Algorithms, 2018, 78 : 21 - 39
  • [28] Research of neural network method in the faults diagnosis of large-scale turbo units
    Liu, G.L.
    Cheng, H.J.
    Shuili Fadian Xuebao/Journal of Hydroelectric Engineering, 2001, (02):
  • [29] Magnetic Flux Leakage Method: Large-Scale Approximation
    Pimenova, A. V.
    Goldobin, D. S.
    Levesley, J.
    Ivantsov, A. O.
    Elkington, P.
    Bacciarelli, M.
    MATHEMATICAL MODELLING OF NATURAL PHENOMENA, 2015, 10 (03) : 61 - 70
  • [30] NeuronLink: An Efficient Chip-to-Chip Interconnect for Large-Scale Neural Network Accelerators
    Xiao, Shanlin
    Guo, Yuhao
    Liao, Wenkang
    Deng, Huipeng
    Luo, Yi
    Zheng, Huanliang
    Wang, Jian
    Li, Cheng
    Li, Gezi
    Yu, Zhiyi
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (09) : 1966 - 1978