Multi-Agent Constrained Policy Optimization for Conflict-Free Management of Connected Autonomous Vehicles at Unsignalized Intersections

被引:3
|
作者
Zhao, Rui [1 ]
Li, Yun [2 ]
Gao, Fei [3 ]
Gao, Zhenhai [3 ]
Zhang, Tianyao [3 ]
机构
[1] Jilin Univ, Coll Automot Engn, Changchun 130025, Peoples R China
[2] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo 1138654, Japan
[3] Jilin Univ, State Key Lab Automot Simulat & Control, Changchun 130025, Peoples R China
基金
中国国家自然科学基金;
关键词
Safety; Computational efficiency; Trajectory; Autonomous vehicles; Roads; Collaboration; Vehicle dynamics; Conflict-free management; connected autono-mous vehicles; safety reinforcement learning; multi-agent constrained policy optimization; unsignalized intersections; AUTOMATED VEHICLES; SYSTEM;
D O I
10.1109/TITS.2023.3331723
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Autonomous Intersection Management (AIM) systems present a new paradigm for conflict-free cooperation of connected autonomous vehicles (CAVs) at road intersections, the aim of which is to eliminate collisions and improve the traffic efficiency and ride comfort. Given the challenges of current centralized coordination methods in balancing high computational efficiency and robust safety assurance, this paper proposes an innovative conflict-free management scheme for CAVs at unsignalized intersections, leveraging safe multi-agent deep reinforcement learning (MADRL). Firstly, we formulate the safe MADRL problem as a constrained Markov game (CMG) and then transform the AIM problem into a CMG by carefully designing state, action, reward, and cost functions. Subsequently, we propose the Multi-Agent Constrained Policy Optimization (MACPO), specifically tailored to solve the CMG problem. MACPO incorporates safety constraints that further restrict the trust region formed by the Kullback-Leibler (KL) divergence, facilitating reinforcement learning policy updates that maximize performance while keeping constraint costs within their limit bounds. This leads us to introduce the MACPO-based AIM Algorithm. Finally, we train an AIM policy and compare its computation time, ride comfort, traffic efficiency, and safety with management schemes based on Model Predictive Control (MPC), Mixed Integer Programming (MIP), and non-safety-aware reinforcement learning. According to the results, compared with the MPC and MIP methods, our method has increased computational efficiency by 65.22 times and 731.52 times respectively, and has improved traffic efficiency by 2.41 times and 1.80 times respectively. In contrast to the non-safety awareness RL methods, our method achieves a zero collision rate for the first time, while also enhancing ride comfort, highlighting the advantages of using MACPO.
引用
收藏
页码:5374 / 5388
页数:15
相关论文
共 48 条
  • [41] Real-time optimal energy management of microgrid based on multi-agent proximal policy optimization
    Danlu Wang
    Qiuye Sun
    Hanguang Su
    Neural Computing and Applications, 2025, 37 (10) : 7145 - 7157
  • [42] Multi-objective optimization of hybrid electric vehicles energy management using multi-agent deep reinforcement learning framework
    Li, Xiaoyu
    Zhou, Zaihang
    Wei, Changyin
    Gao, Xiao
    Zhang, Yibo
    ENERGY AND AI, 2025, 20
  • [43] A Study of TMA Aircraft Conflict-Free Routing and Operation: With Mixed Integer Linear Programming, Multi-Agent Path Finding, and Metaheuristic-Based Neighborhood Search
    Zhang, Yi
    Zhang, Sheng
    Zhang, Yicheng
    Yin, Yifang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (10) : 13976 - 13990
  • [44] Multi-Agent Proximal Policy Optimization-Based Dynamic Client Selection for Federated AI in 6G-Oriented Internet of Vehicles
    Yu, Tianqi
    Wang, Xianbin
    Hu, Jianling
    Yang, Jianfeng
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (09) : 13611 - 13624
  • [45] Enhancing Energy Management Strategy for Battery Electric Vehicles: Incorporating Cell Balancing and Multi-Agent Twin Delayed Deep Deterministic Policy Gradient Architecture
    Lotfy, Armin
    Chaoui, Hicham
    Kandidayeni, Mohsen
    Boulon, Loic
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (11) : 16593 - 16607
  • [46] Interior-point policy optimization based multi-agent deep reinforcement learning method for secure home energy management under various uncertainties
    Zhang, Yiwen
    Lin, Rui
    Mei, Zhen
    Lyu, Minghao
    Jiang, Huaiguang
    Xue, Ying
    Zhang, Jun
    Gao, David Wenzhong
    APPLIED ENERGY, 2024, 376
  • [47] Consumer-Centric Home Energy Management System Using Trust Region Policy Optimization-Based Multi-Agent Deep Reinforcement Learning
    Thattai, Kuthsav
    Ravishankar, Jayashri
    Li, Chaojie
    2023 IEEE BELGRADE POWERTECH, 2023,
  • [48] Model-free reinforcement learning-based energy management for plug-in electric vehicles in a cooperative multi-agent home microgrid with consideration of travel behavior
    Salari, Azam
    Zeinali, Mahdi
    Marzband, Mousa
    ENERGY, 2024, 288