An Efficient Fisher Matrix Approximation Method for Large-Scale Neural Network Optimization

被引：2

作者：

Yang, Minghan ^{[1
]}

Xu, Dong ^{[2
]}

Cui, Qiwen ^{[3
]}

Wen, Zaiwen ^{[4
,5
]}

Xu, Pengxiang ^{[6
]}

机构：

[1] DAMO Acad, Alibaba Grp, Hangzhou 310000, Peoples R China

[2] Peking Univ, Beijing Int Ctr Math Res, Beijing 100871, Peoples R China

[3] Univ Washington, Sch Comp Sci & Engn, Seattle, WA 98195 USA

[4] Peking Univ, Coll Engn, Beijing Int Ctr Math Res, Beijing 100871, Peoples R China

[5] Peking Univ, Ctr Machine Learning Res, Beijing 100871, Peoples R China

[6] Peng Cheng Lab, Shenzhen 518066, Guangdong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 05期

关键词：

Empirical risk minimization problems; stochastic optimization; natural gradient method; convergence; QUASI-NEWTON METHOD;

D O I：

10.1109/TPAMI.2022.3213654

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Although the shapes of the parameters are not crucial for designing first-order optimization methods in large scale empirical risk minimization problems, they have important impact on the size of the matrix to be inverted when developing second-order type methods. In this article, we propose an efficient and novel second-order method based on the parameters in the real matrix space R-mxn and a matrix-product approximate Fisher matrix (MatFisher) by using the products of gradients. The size of the matrix to be inverted is much smaller than that of the Fisher information matrix in the real vector space R-d. Moreover, by utilizing the matrix delayed update and the block diagonal approximation techniques, the computational cost can be controlled and is comparable with first-order methods. A global convergence and a superlinear local convergence analysis are established under mild conditions. Numerical results on image classification with ResNet50, quantum chemistry modeling with SchNet, and data-driven partial differential equations solution with PINN illustrate that our method is quite competitive to the state-of-the-art methods.

引用

页码：5391 / 5403

页数：13

共 50 条

[31] NeuronLink: An Efficient Chip-to-Chip Interconnect for Large-Scale Neural Network Accelerators
Xiao, Shanlin
Guo, Yuhao
Liao, Wenkang
Deng, Huipeng
Luo, Yi
Zheng, Huanliang
Wang, Jian
Li, Cheng
Li, Gezi
Yu, Zhiyi
Xiao, Shanlin (xiaoshlin@mail.sysu.edu.cn); Yu, Zhiyi (yuzhiyi@mail.sysu.edu.cn), 1966, Institute of Electrical and Electronics Engineers Inc. (28): : 1966 - 1978
[32] POSTER: ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU Clusters
Li, Shunde
Gu, Junyu
Wang, Jue
Yao, Tiechui
Liang, Zhiqiang
Shi, Yumeng
Li, Shigang
Xi, Weiting
Li, Shushen
Zhou, Chunbao
Wang, Yangang
Chi, Xuebin
PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024, 2024, : 469 - 471
[33] Autonomous and decentralized optimization of large-scale heterogeneous wireless networks by neural network dynamics
Hasegawa, Mikio
Tran, Ha Nguyen
Miyamoto, Goh
Murata, Yoshitoshi
Harada, Hiroshi
Kato, Shuzo
IEICE TRANSACTIONS ON COMMUNICATIONS, 2008, E91B (01) : 110 - 118
[34] Algorithm optimization of large-scale supply chain design based on FPGA and neural network
Li, Ting
MICROPROCESSORS AND MICROSYSTEMS, 2021, 81
[35] An efficient method for large-scale slack allocation
Joshi, Siddharth
Boyd, Stephen
ENGINEERING OPTIMIZATION, 2009, 41 (12) : 1163 - 1176
[36] An Efficient Grouping Method for Large-Scale MBIST
Yang, Rongjie
Wang, Zheng
Shen, Minghua
2024 INTERNATIONAL SYMPOSIUM OF ELECTRONICS DESIGN AUTOMATION, ISEDA 2024, 2024, : 486 - 491
[37] APPROXIMATION, ADAPTATION AND AUTOMATION CONCEPTS FOR LARGE-SCALE STRUCTURAL OPTIMIZATION
PRASAD, B
ENGINEERING OPTIMIZATION, 1983, 6 (03) : 129 - 140
[38] An Efficient Method for Large-Scale Gate Sizing
Joshi, Siddharth
Boyd, Stephen
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2008, 55 (09) : 2760 - 2773
[39] A SUBSPACE METHOD FOR LARGE-SCALE EIGENVALUE OPTIMIZATION
Kangal, Fatih
Meerbergen, Karl
Mengi, Emre
Michiels, Wim
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2018, 39 (01) : 48 - 82
[40] EFFICIENT TREATMENT OF CONSTRAINTS IN LARGE-SCALE STRUCTURAL OPTIMIZATION
ARORA, JS
HAUG, EJ
RAJAN, SD
ENGINEERING OPTIMIZATION, 1981, 5 (02) : 105 - 120

← 1 2 3 4 5 →