A mathematical framework for improved weight initialization of neural networks using Lagrange multipliers

被引:7
|
作者
de Pater, Ingeborg [1 ]
Mitici, Mihaela [2 ]
机构
[1] Delft Univ Technol, Fac Aerosp Engn, NL-2926 HS Delft, Netherlands
[2] Univ Utrecht, Fac Sci, Heidelberglaan 8, NL-3584 CS Utrecht, Netherlands
关键词
Weight initialization; Neural network training; Linear regression; Lagrange function; Remaining useful life;
D O I
10.1016/j.neunet.2023.07.035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A good weight initialization is crucial to accelerate the convergence of the weights in a neural network. However, training a neural network is still time-consuming, despite recent advances in weight initialization approaches. In this paper, we propose a mathematical framework for the weight initialization in the last layer of a neural network. We first derive analytically a tight constraint on the weights that accelerates the convergence of the weights during the back-propagation algorithm. We then use linear regression and Lagrange multipliers to analytically derive the optimal initial weights and initial bias of the last layer, that minimize the initial training loss given the derived tight constraint. We also show that the restrictive assumption of traditional weight initialization algorithms that the expected value of the weights is zero is redundant for our approach. We first apply our proposed weight initialization approach to a Convolutional Neural Network that predicts the Remaining Useful Life of aircraft engines. The initial training and validation loss are relatively small, the weights do not get stuck in a local optimum, and the convergence of the weights is accelerated. We compare our approach with several benchmark strategies. Compared to the best performing stateof-the-art initialization strategy (Kaiming initialization), our approach needs 34% less epochs to reach the same validation loss. We also apply our approach to ResNets for the CIFAR-100 dataset, combined with transfer learning. Here, the initial accuracy is already at least 53%. This gives a faster weight convergence and a higher test accuracy than the benchmark strategies.& COPY; 2023 Published by Elsevier Ltd.
引用
收藏
页码:579 / 594
页数:16
相关论文
共 50 条
  • [31] Interval Based Weight Initialization Method for Sigmoidal Feedforward Artificial Neural Networks
    Sodhi, Sartaj Singh
    Chandra, Pravin
    2ND AASRI CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND BIOINFORMATICS, 2014, 6 : 19 - 25
  • [32] DAVINZ: Data Valuation using Deep Neural Networks at Initialization
    Wu, Zhaoxuan
    Shu, Yao
    Low, Bryan Kian Hsiang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [33] Decentralized Principal Component Analysis by Integrating Lagrange Programming Neural Networks With Alternating Direction Method of Multipliers
    Ye, Zhonghua
    Zhu, Hong
    IEEE ACCESS, 2020, 8 : 182842 - 182852
  • [34] Exact Neural Networks from Inexact Multipliers via Fibonacci Weight Encoding
    Simon, William Andrew
    Ray, Valerian
    Levisse, Alexandre
    Ansaloni, Giovanni
    Zapater, Marina
    Atienza, David
    2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 805 - 810
  • [35] An analysis of weight initialization methods in connection with different activation functions forfeedforward neural networks
    Wong, Kit
    Dornberger, Rolf
    Hanne, Thomas
    EVOLUTIONARY INTELLIGENCE, 2024, 17 (03) : 2081 - 2089
  • [36] Volterra signal modelling using Lagrange Programming neural networks
    Chan, S
    Stathaki, T
    Constantinides, A
    NEURAL NETWORKS FOR SIGNAL PROCESSING VIII, 1998, : 264 - 273
  • [37] AN INITIALIZATION METHOD FOR FEEDFORWARD ARTIFICIAL NEURAL NETWORKS USING POLYNOMIAL BASES
    Varnava, Thanasis M.
    Meade, Andrew J., Jr.
    ADVANCES IN DATA SCIENCE AND ADAPTIVE ANALYSIS, 2011, 3 (03) : 385 - 400
  • [38] A Systematic DNN Weight Pruning Framework Using Alternating Direction Method of Multipliers
    Zhang, Tianyun
    Ye, Shaokai
    Zhang, Kaiqi
    Tang, Jian
    Wen, Wujie
    Fardad, Makan
    Wang, Yanzhi
    COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 191 - 207
  • [39] Improving the Accuracy and Hardware Efficiency of Neural Networks Using Approximate Multipliers
    Ansari, Mohammad Saeed
    Mrazek, Vojtech
    Cockburn, Bruce F.
    Sekanina, Lukas
    Vasicek, Zdenek
    Han, Jie
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (02) : 317 - 328
  • [40] AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers
    Faraone, Julian
    Kumm, Martin
    Hardieck, Martin
    Zipf, Peter
    Liu, Xueyuan
    Boland, David
    Leong, Philip H. W.
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (01) : 115 - 128