Multi-Agent Deep Reinforcement Learning for Multi-Echelon Inventory Management

被引:0
|
作者
Liu, Xiaotian [1 ]
Hu, Ming [2 ]
Peng, Yijie [3 ]
Yang, Yaodong [4 ]
机构
[1] Peking Univ, Guanghua Sch Management, Beijing, Peoples R China
[2] Univ Toronto, Rotman Sch Management, Toronto, ON M5S 3E6, Canada
[3] Peking Univ, PKU Wuhan Inst Artificial Intelligence, Guanghua Sch Management, Xiangjiang Lab, Beijing, Peoples R China
[4] Peking Univ, Inst Artificial Intelligence, PKU Wuhan Inst Artificial Intelligence, Beijing, Peoples R China
基金
加拿大自然科学与工程研究理事会; 美国国家科学基金会;
关键词
Multi-Echelon Inventory Management; Multi-Agent Reinforcement Learning; Bullwhip Effect; OPTIMAL POLICIES; OPTIMALITY;
D O I
10.1177/10591478241305863
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
We apply heterogeneous-agent proximal policy optimization (HAPPO), a multi-agent deep reinforcement learning (MADRL) algorithm, to the decentralized multi-echelon inventory management problems in both a serial supply chain and a supply chain network. We also examine whether the upfront-only information-sharing mechanism used in MADRL helps alleviate the bullwhip effect. Our results show that policies constructed by HAPPO achieve lower overall costs than policies constructed by single-agent deep reinforcement learning and other heuristic policies. Also, the application of HAPPO results in a less significant bullwhip effect than policies constructed by single-agent deep reinforcement learning where information is not shared among actors. Somewhat surprisingly, compared to using the overall costs of the system as a minimization target for each actor, HAPPO achieves lower overall costs when the minimization target for each actor is a combination of its own costs and the overall costs of the system. Our results provide a new perspective on the benefit of information sharing inside the supply chain that helps alleviate the bullwhip effect and improve the overall performance of the system. Upfront information sharing and action coordination in model training among actors is essential, with the former even more essential, for improving a supply chain's overall performance when applying MADRL. Neither actors being fully self-interested nor actors being fully system-focused leads to the best practical performance of policies learned and constructed by MADRL. Our results also verify MADRL's potential in solving various multi-echelon inventory management problems with complex supply chain structures and in non-stationary market environments.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Deep Multi-Agent Reinforcement Learning: A Survey
    Liang X.-X.
    Feng Y.-H.
    Ma Y.
    Cheng G.-Q.
    Huang J.-C.
    Wang Q.
    Zhou Y.-Z.
    Liu Z.
    Zidonghua Xuebao/Acta Automatica Sinica, 2020, 46 (12): : 2537 - 2557
  • [22] Multi-agent deep reinforcement learning: a survey
    Sven Gronauer
    Klaus Diepold
    Artificial Intelligence Review, 2022, 55 : 895 - 943
  • [23] Lenient Multi-Agent Deep Reinforcement Learning
    Palmer, Gregory
    Tuyls, Karl
    Bloembergen, Daan
    Savani, Rahul
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 443 - 451
  • [24] Multi-agent deep reinforcement learning: a survey
    Gronauer, Sven
    Diepold, Klaus
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (02) : 895 - 943
  • [25] Learning to Communicate with Deep Multi-Agent Reinforcement Learning
    Foerster, Jakob N.
    Assael, Yannis M.
    de Freitas, Nando
    Whiteson, Shimon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [26] A multi-echelon inventory system with returns
    Korugan, A
    Gupta, SM
    COMPUTERS & INDUSTRIAL ENGINEERING, 1998, 35 (1-2) : 145 - 148
  • [27] Multi-agent Reinforcement Learning in Network Management
    Bagnasco, Ricardo
    Serrat, Joan
    SCALABILITY OF NETWORKS AND SERVICES, PROCEEDINGS, 2009, 5637 : 199 - 202
  • [28] A Nonparametric Learning Algorithm for a Stochastic Multi-echelon Inventory Problem
    Yang, Cong
    Huh, Woonghee Tim
    PRODUCTION AND OPERATIONS MANAGEMENT, 2024, 33 (03) : 701 - 720
  • [29] Deep Reinforcement Learning and Optimization Approach for Multi-echelon Supply Chain with Uncertain Demands
    Alves, Julio Cesar
    Mateus, Geraldo Robson
    COMPUTATIONAL LOGISTICS, ICCL 2020, 2020, 12433 : 584 - 599
  • [30] Inventory management in a multi-echelon spare parts supply chain
    Kalchschmidt, M
    Zotteri, G
    Verganti, R
    INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2003, 81-2 : 397 - 413