An Efficient Fisher Matrix Approximation Method for Large-Scale Neural Network Optimization

被引:2
|
作者
Yang, Minghan [1 ]
Xu, Dong [2 ]
Cui, Qiwen [3 ]
Wen, Zaiwen [4 ,5 ]
Xu, Pengxiang [6 ]
机构
[1] DAMO Acad, Alibaba Grp, Hangzhou 310000, Peoples R China
[2] Peking Univ, Beijing Int Ctr Math Res, Beijing 100871, Peoples R China
[3] Univ Washington, Sch Comp Sci & Engn, Seattle, WA 98195 USA
[4] Peking Univ, Coll Engn, Beijing Int Ctr Math Res, Beijing 100871, Peoples R China
[5] Peking Univ, Ctr Machine Learning Res, Beijing 100871, Peoples R China
[6] Peng Cheng Lab, Shenzhen 518066, Guangdong, Peoples R China
关键词
Empirical risk minimization problems; stochastic optimization; natural gradient method; convergence; QUASI-NEWTON METHOD;
D O I
10.1109/TPAMI.2022.3213654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although the shapes of the parameters are not crucial for designing first-order optimization methods in large scale empirical risk minimization problems, they have important impact on the size of the matrix to be inverted when developing second-order type methods. In this article, we propose an efficient and novel second-order method based on the parameters in the real matrix space R-mxn and a matrix-product approximate Fisher matrix (MatFisher) by using the products of gradients. The size of the matrix to be inverted is much smaller than that of the Fisher information matrix in the real vector space R-d. Moreover, by utilizing the matrix delayed update and the block diagonal approximation techniques, the computational cost can be controlled and is comparable with first-order methods. A global convergence and a superlinear local convergence analysis are established under mild conditions. Numerical results on image classification with ResNet50, quantum chemistry modeling with SchNet, and data-driven partial differential equations solution with PINN illustrate that our method is quite competitive to the state-of-the-art methods.
引用
收藏
页码:5391 / 5403
页数:13
相关论文
共 50 条
  • [41] Survey on Large-scale Graph Neural Network Systems
    Zhao G.
    Wang Q.-G.
    Yao F.
    Zhang Y.-F.
    Yu G.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (01): : 150 - 170
  • [42] Nodes clustering method in large-scale network
    Ju Hong-Jun
    Du Li-Juan
    2012 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING (WICOM), 2012,
  • [43] Marginalized Neural Network Mixtures for Large-Scale Regression
    Lazaro-Gredilla, Miguel
    Figueiras-Vidal, Anibal R.
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2010, 21 (08): : 1345 - 1351
  • [44] A Novel Approach to Large-scale IP Traffic Matrix Estimation Based on RBF Neural Network
    Jiang, Dingde
    Hu, Guangmin
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 4460 - 4463
  • [45] An efficient algorithm for large-scale RFID Network Planning
    Bin Hasnan, Khalid
    Talib, Nihad Hasan
    Bin Nawawi, Azli
    Abdullah, Haslina Binti
    Elewe, Adel Muhsin
    Tahir, Suhaidah
    2019 IEEE JORDAN INTERNATIONAL JOINT CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY (JEEIT), 2019, : 519 - 524
  • [46] On Efficient Network Planning and Routing in Large-Scale MANETs
    El-Hajj, Wassim
    Al-Fuqaha, Ala
    Guizani, Mohsen
    Chen, Hsiao-Hwa
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2009, 58 (07) : 3796 - 3801
  • [47] On MAC optimization for large-scale wireless sensor network
    Wang, Ji
    Ren, Xiaoli
    Chen, Fang-jiong
    Chen, Yankun
    Xu, Guobao
    WIRELESS NETWORKS, 2016, 22 (06) : 1877 - 1889
  • [48] On MAC optimization for large-scale wireless sensor network
    Ji Wang
    Xiaoli Ren
    Fang-jiong Chen
    Yankun Chen
    Guobao Xu
    Wireless Networks, 2016, 22 : 1877 - 1889
  • [49] Large-Scale Nystrom Kernel Matrix Approximation Using Randomized SVD
    Li, Mu
    Bi, Wei
    Kwok, James T.
    Lu, Bao-Liang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (01) : 152 - 164
  • [50] Profitable areas in large-scale FTTH network optimization
    Zotkiewicz, Mateusz
    Mycek, Mariusz
    Tomaszewski, Artur
    TELECOMMUNICATION SYSTEMS, 2016, 61 (03) : 591 - 608