A mathematical framework for improved weight initialization of neural networks using Lagrange multipliers

被引:7
|
作者
de Pater, Ingeborg [1 ]
Mitici, Mihaela [2 ]
机构
[1] Delft Univ Technol, Fac Aerosp Engn, NL-2926 HS Delft, Netherlands
[2] Univ Utrecht, Fac Sci, Heidelberglaan 8, NL-3584 CS Utrecht, Netherlands
关键词
Weight initialization; Neural network training; Linear regression; Lagrange function; Remaining useful life;
D O I
10.1016/j.neunet.2023.07.035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A good weight initialization is crucial to accelerate the convergence of the weights in a neural network. However, training a neural network is still time-consuming, despite recent advances in weight initialization approaches. In this paper, we propose a mathematical framework for the weight initialization in the last layer of a neural network. We first derive analytically a tight constraint on the weights that accelerates the convergence of the weights during the back-propagation algorithm. We then use linear regression and Lagrange multipliers to analytically derive the optimal initial weights and initial bias of the last layer, that minimize the initial training loss given the derived tight constraint. We also show that the restrictive assumption of traditional weight initialization algorithms that the expected value of the weights is zero is redundant for our approach. We first apply our proposed weight initialization approach to a Convolutional Neural Network that predicts the Remaining Useful Life of aircraft engines. The initial training and validation loss are relatively small, the weights do not get stuck in a local optimum, and the convergence of the weights is accelerated. We compare our approach with several benchmark strategies. Compared to the best performing stateof-the-art initialization strategy (Kaiming initialization), our approach needs 34% less epochs to reach the same validation loss. We also apply our approach to ResNets for the CIFAR-100 dataset, combined with transfer learning. Here, the initial accuracy is already at least 53%. This gives a faster weight convergence and a higher test accuracy than the benchmark strategies.& COPY; 2023 Published by Elsevier Ltd.
引用
收藏
页码:579 / 594
页数:16
相关论文
共 50 条
  • [41] Efficient Design of Artificial Neural Networks using Approximate Compressors and Multipliers
    Naresh, Kattekola
    Majumdar, Shubhankar
    Sai, Y. Padma
    Sai, P. Rohith
    2021 IEEE INTERNATIONAL SYMPOSIUM ON SMART ELECTRONIC SYSTEMS (ISES 2021), 2021, : 153 - 156
  • [42] An Improved Algorithm of Neural Networks with Cubic Spline Weight Function
    Liu Keyuan
    Li Haibin
    He Yan
    Duan Zhixin
    2010 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-5, 2010, : 2673 - 2677
  • [43] LAYER-WISE INTERPRETATION OF DEEP NEURAL NETWORKS USING IDENTITY INITIALIZATION
    Kubota, Shohei
    Hayashi, Hideaki
    Hayase, Tomohiro
    Uchida, Seiichi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3945 - 3949
  • [44] Producing satellite retrievals for NWP model initialization using artificial neural networks
    Kuligowski, RJ
    Barros, AP
    SECOND CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2000, : 72 - 72
  • [45] An Improved Algorithm Using B-Spline Weight Functions for Training Feedforward Neural Networks
    Zhang, Daiyuan
    ADVANCES IN MECHATRONICS AND CONTROL ENGINEERING, PTS 1-3, 2013, 278-280 : 1301 - 1304
  • [46] Effects of Weight Initialization in a Feedforward Neural Network for Classification Using a Modified Genetic Algorithm
    Nienhold, Dino
    Schwab, Kilian
    Hanne, Thomas
    Dornberger, Rolf
    2015 3RD INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI 2015), 2015, : 6 - 12
  • [47] Construction and initialization of a hidden layer of multilayer neural networks using linear programming
    Kim, LS
    CRITICAL TECHNOLOGY: PROCEEDINGS OF THE THIRD WORLD CONGRESS ON EXPERT SYSTEMS, VOLS I AND II, 1996, : 986 - 992
  • [48] Evolution of neural networks using weight mapping
    Pujol, JCF
    Poli, R
    GECCO-99: PROCEEDINGS OF THE GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 1999, : 1170 - 1177
  • [49] An Improved Intrusion Detection Framework Based on Artificial Neural Networks
    Hu, Liang
    Zhang, Zhen
    Tang, Huanyu
    Xie, Nannan
    2015 11TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2015, : 1115 - 1120
  • [50] Hybridizing Artificial Neural Networks Through Feature Selection Based Supervised Weight Initialization and Traditional Machine Learning Algorithms for Improved Colon Cancer Prediction
    Sajjad Ahmed Nadeem, Malik
    Hammad Waseem, Muhammad
    Aziz, Wajid
    Habib, Usman
    Masood, Anum
    Attique Khan, Muhammad
    IEEE ACCESS, 2024, 12 : 97099 - 97114