A Parallel Optimization Method of Deep Learning Model for Image Recognition

被引:0
|
作者
Ju T. [1 ]
Zhao Y. [1 ]
Liu S. [1 ]
Yang Y. [1 ]
Yang W. [1 ]
机构
[1] School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou
关键词
deep learning; image recognition; parallel optimization; parameter server;
D O I
10.7652/xjtuxb202301014
中图分类号
学科分类号
摘要
Aiming at the problem of image recognition in machine learning, this paper studies the parallel optimization method of image recognition on the cluster parallel system by combining the existing image recognition methods. The parameter updating mechanism of the distributed stochastic gradient descent algorithm is improved by introducing a parameter server mechanism. On the one hand, the gradient calculated by the Worker node is sparsely processed to reduce the communication load between the Worker node and the parameter server node. On the other hand, the updated model parameters sent by the parameter server node to the Worker node are converted into the accumulated gradient sent by the parameter server node to the Worker node, and then the accumulated gradient is sparsely processed to further reduce the communication load between the Worker node and the parameter server node. In addition, in order to solve the problem of training accuracy loss caused by sparsification, a momentum correction method is introduced to improve the accuracy of image recognition model. Experimental results show that, compared with the basic asynchronous stochastic gradient descent algorithm ASGD, the proposed parallel optimization method in this paper can improve the training speed of deep learning image recognition model by 2.95 times on average and the test accuracy by 4.6% on average under three different compression rates. © 2023 Xi'an Jiaotong University. All rights reserved.
引用
收藏
页码:141 / 151
页数:10
相关论文
共 23 条
  • [1] MOGHADAM H A, DIMITRIJEV S, HAN Jisheng, Et al., Transient-current method for measurement of active near-interface oxide traps in 4H-SiC MOS capacitors and MOSFETs, IEEE Transactions on Electron Devices, 62, 8, pp. 2670-2674, (2015)
  • [2] HUANG S J, SHIH K R., Short-term load forecasting via ARMA model identification including non-Gaussian process considerations, IEEE Transactions on Power Systems, 18, 2, pp. 673-679, (2003)
  • [3] ZHU Hongrui, YUAN Guojun, YAO Chengji, Et al., Survey on network of distributed deep learning training, Journal of Computer Research and Development, 58, 1, pp. 98-115, (2021)
  • [4] ZHU Huming, LI Pei, JIAO Licheng, Et al., Review of parallel deep neural network, Chinese Journal of Computers, 41, 8, pp. 1861-1881, (2018)
  • [5] BAO Fenye, CHEN I R., Trust management for the internet of things and its application to service composition, 2012 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks(WoWMoM), pp. 1-6, (2012)
  • [6] KIM Y, CHOI H, LEE J, Et al., Efficient large-scale deep learning framework for heterogeneous multi-GPU cluster, 2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems(FAS*W), pp. 176-181, (2019)
  • [7] SHEN Jian, ZHOU Tianqi, WANG Kun, Et al., Artificial intelligence inspired multi-dimensional traffic control for heterogeneous networks, IEEE Network, 32, 6, pp. 84-91, (2018)
  • [8] COATES A, HUVAL B, WANG Tao, Et al., Deep learning with COTS HPC systems, Proceedings of the 30th International Conference on International Conference on Machine Learning, (2013)
  • [9] JIA Zhihao, ZAHARIA M, AIKEN A., Beyond data and model parallelism for deep neural networks [J/OL]
  • [10] STROM N., Scalable distributed DNN training using commodity GPU cloud computing, Proceedings of the Interspeech 2015, pp. 1488-1492, (2015)