Packet Routing Method for Multi-Stage Networks Based on Reinforcement Learning

被引:0
|
作者
Gao Y. [1 ]
Luo L. [1 ]
Sun G. [1 ]
机构
[1] Key Lab of Optical Fiber Sensing and Communications, University of Electronic Science and Technology of China, Chengdu
关键词
Cluster network; Packet routing; Policy iteration; Reinforcement learning;
D O I
10.12178/1001-0548.2021260
中图分类号
学科分类号
摘要
Multi-stage networks are widely used in machine learning clusters. Due to the large number of available paths in a multi-stage network, packet routing is a combinatorial optimization problem. Existing routing algorithms based on heuristics lack performance guarantee, which seriously affects the packet transmission delay. This paper proposes a packet routing method based on reinforcement learning for multi-stage networks, using a novel policy iteration algorithm to compute an optimal routing policy by learning. In the policy evaluation step, this algorithm uses the maximum likelihood estimator of the value function, which overcomes the low sample efficiency problem of Monte Carlo (MC) or Temporal-Difference (TD) value function estimators in reinforcement learning. To deal with the high computational complexity of the combinatorial optimization problem in the policy improvement step, this algorithm decomposes the optimization over a combinatorial action space into a sequential optimization of each action. Experiments based on NS-3 network simulator show that the routing policy learnt by the algorithm reduces 13.9% of the average packet transmission delay compared to existing best routing heuristics. Copyright ©2022 Journal of University of Electronic Science and Technology of China. All rights reserved.
引用
收藏
页码:200 / 206
页数:6
相关论文
共 18 条
  • [1] HE K M, ZHANG X Y, REN S Q, Et al., Deep residual learning for image recognitions, IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, (2016)
  • [2] BAHDANAU D, KYUNGHYUN C, BENGIO Y., Neural machine translation by jointly learning to align and translate, International Conference on Learning Representations, pp. 1-15, (2015)
  • [3] HOJABR R, MODARRESSI M, DANESHTALAB M, Et al., Customizing Clos network-on-chip for neural networks, IEEE Transactions on Computers, 66, 11, pp. 1865-1877, (2017)
  • [4] GEBARA N, GHOBADI M, COSTA P., In-network aggregation for shared machine learning clusters, Proceedings of Machine Learning and Systems, pp. 1-16, (2021)
  • [5] SINGH A, ONG J, AGARWAL A, Et al., Jupiter rising: A decade of Clos topologies and centralized control in Google's datacenter network, Proceedings of the Conference of the ACM Special Interest Group on Data Communication, pp. 183-197, (2015)
  • [6] ZHENG W, CROWCROFT J., Analysis of shortest-path routing algorithms in a dynamic network environment, ACM SIGCOMM Computer Communication Review, 22, 2, pp. 63-71, (1992)
  • [7] GHORBANI S, GODFREY B, GANJALI Y, Et al., Micro load balancing in data centers with DRILL, Proceedings of the ACM Workshop on Hot Topics in Networks, pp. 1-7, (2015)
  • [8] PUTERMAN M L., Markov decision process: Discrete stochastic dynamic programming, (2014)
  • [9] PENEDONES H, RIQUELME C, VINCENT D., Adaptive temporal-difference learning for policy evaluation with per-state uncertainty estimates, Advances in Neural Information Processing Systems, pp. 1-22, (2019)
  • [10] SUTTON R S, BARTO A G., Reinforcement learning: An introduction, (2018)