On the Local Hessian in Back-propagation

被引:0
|
作者
Zhang, Huishuai [1 ]
Chen, Wei [1 ]
Liu, Tie-Yan [1 ]
机构
[1] Microsoft Res Asia, Beijing 100080, Peoples R China
关键词
NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Back-propagation (BP) is the foundation for successfully training deep neural networks. However, BP sometimes has difficulties in propagating a learning signal deep enough effectively, e.g., the vanishing gradient phenomenon. Meanwhile, BP often works well when combining with "designing tricks" like orthogonal initialization, batch normalization and skip connection. There is no clear understanding on what is essential to the efficiency of BP. In this paper, we take one step towards clarifying this problem. We view BP as a solution of back-matching propagation which minimizes a sequence of back-matching losses each corresponding to one block of the network. We study the Hessian of the local back-matching loss (local Hessian) and connect it to the efficiency of BP. It turns out that those designing tricks facilitate BP by improving the spectrum of local Hessian. In addition, we can utilize the local Hessian to balance the training pace of each block and design new training algorithms. Based on a scalar approximation of local Hessian, we propose a scale-amended SGD algorithm. We apply it to train neural networks with batch normalization, and achieve favorable results over vanilla SGD. This corroborates the importance of local Hessian from another side.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] On the alleviation of the problem of local minima in back-propagation
    Magoulas, GD
    Vrahatis, MN
    Androulakis, GS
    NONLINEAR ANALYSIS-THEORY METHODS & APPLICATIONS, 1997, 30 (07) : 4545 - 4550
  • [2] BACK-PROPAGATION
    JONES, WP
    HOSKINS, J
    BYTE, 1987, 12 (11): : 155 - &
  • [3] Back-propagation of accuracy
    Senashova, MY
    Gorban, AN
    Wunsch, DC
    1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 1998 - 2001
  • [4] Back-propagation with Chaos
    Fazayeli, Farideh
    Wang, Lipo
    Liu, Wen
    2008 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2007, : 5 - 8
  • [5] Back-propagation is not efficient
    Sima, J
    NEURAL NETWORKS, 1996, 9 (06) : 1017 - 1023
  • [6] Sequential Back-Propagation
    王晖
    刘大有
    王亚飞
    Journal of Computer Science and Technology, 1994, (03) : 252 - 260
  • [7] A study on how to help back-propagation escape local minimum
    Chai, Shaobin
    Zhou, Yong
    ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 1, PROCEEDINGS, 2007, : 64 - +
  • [8] A modified back-propagation method to avoid false local minima
    Fukuoka, Y
    Matsuki, H
    Minamitani, H
    Ishida, A
    NEURAL NETWORKS, 1998, 11 (06) : 1059 - 1072
  • [9] Improving back-propagation: Epsilon-back-propagation
    Trejo, LA
    Sandoval, C
    FROM NATURAL TO ARTIFICIAL NEURAL COMPUTATION, 1995, 930 : 427 - 432
  • [10] FEATURE CONSTRUCTION FOR BACK-PROPAGATION
    PIRAMUTHU, S
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 496 : 264 - 268