Residual Based Gated Recurrent Unit

被引:0
|
作者
Zhang Z.-H. [1 ]
Dong F.-M. [1 ]
Hu F. [1 ]
Wu Y.-R. [1 ,2 ]
Sun S.-F. [1 ,2 ]
机构
[1] College of Computer and Information Technology, China Three Gorges University, Yichang
[2] Yichang Key Laboratory of Intelligent Medicine, Yichang
来源
Zidonghua Xuebao/Acta Automatica Sinica | 2022年 / 48卷 / 12期
基金
中国国家自然科学基金;
关键词
Deep learning; gated recurrent unit; recurrent neural networks; skip connect;
D O I
10.16383/j.aas.c190591
中图分类号
学科分类号
摘要
Traditional recurrent neural networks are prone to the problems of vanishing gradient and degradation. Relying on the facts that non-saturated activation functions can effectively overcome the vanishing gradient problem, and the residual structure in convolution neural network can effectively alleviate the degradation problem, we propose a residual−gated recurrent unit (Re-GRU) which leverages gated recurrent unit (GRU) to alleviate the problems of vanishing gradient and degradation. There are two main improvements in Re-GRU. One is to replace the activation function of the candidate hidden state in GRU with the non-saturated activation function. The other is to introduce the residual information into the candidate hidden state representation of the GRU. The modification of candidate hidden state activation function can not only effectively avoid vanishing gradient caused by non-saturated activation function, but also introduce residual information to make the network more sensitive to gradient change, so as to alleviate the degradation problem. We conducted three kinds of test experiments, including image recognition, building language model, and speech recognition. The results indicate that our proposed Re-GRU has higher detection performance than other 6 methods. Specifically, we achieved a test-set perplexity of 23.88 on the Penn Treebank data set in language model prediction task, which is one half of the lowest value ever recorded. © 2022 Science Press. All rights reserved.
引用
收藏
页码:3067 / 3074
页数:7
相关论文
共 25 条
  • [1] Graves A, Jaitly N, Mohamed A., Hybrid speech recognition with deep bidirectional LSTM, Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 273-278, (2013)
  • [2] Mikolov T, Zweig G., Context dependent recurrent neural network language model, Proceedings of the 2012 IEEE Spoken Language Technology Workshop, pp. 234-239, (2012)
  • [3] Zhao B, Tam Y C., Bilingual recurrent neural networks for improved statistical machine translation, Proceedings of the 2014 IEEE Spoken Language Technology Workshop, pp. 66-70, (2014)
  • [4] Xi Xue-Feng, Zhou Guo-Dong, A survey on deep learning for natural language processing, Acta Automatica Sinica, 42, 10, pp. 1445-1465, (2016)
  • [5] Hochreiter S., The vanishing gradient problem during learning recurrent neural nets and problem solutions, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6, 2, (1998)
  • [6] Hochreiter S, Schmidhuber J., Long short-term memory, Neural Computation, 9, 8, pp. 1735-1780, (1997)
  • [7] Sutskever I, Vinyals O, Le Q V., Sequence to sequence learning with neural networks, Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2, pp. 3104-3112, (2014)
  • [8] Morgan N., Deep and wide: Multiple layers in automatic speech recognition, Transactions on Audio, Speech, and Language Processing, 20, 1, (2011)
  • [9] Srivastava R K, Greff K, Schmidhuber J., Training very deep networks, Proceedings of the 28th International Conference on Neural Information Processing Systems, pp. 2377-2385, (2015)
  • [10] Orhan E, Pitkow X., Skip Connections eliminate singularities, Proceedings of the International Conference on Learning Representations, pp. 1-22, (2018)