A batch reinforcement learning approach to vacant taxi routing

被引:7
|
作者
Yu, Xinlian [1 ]
Gao, Song [2 ]
机构
[1] Southeast Univ, Sch Transportat, Nanjing, Peoples R China
[2] Univ Massachusetts, Dept Civil & Environm Engn, Amherst, MA USA
关键词
Vacant taxi routing; Markov decision process; Batch reinforcement learning; Fitted Q-iteration; MARKOV DECISION-PROCESS; GO; FRAMEWORK; NETWORKS; FLEET; MODEL; GAME;
D O I
10.1016/j.trc.2022.103640
中图分类号
U [交通运输];
学科分类号
08 ; 0823 ;
摘要
The optimal routing of a single vacant taxi is formulated as a Markov Decision Process (MDP) problem to account for profit maximization over a full working period in a transportation network. A batch offline reinforcement learning (RL) method is proposed to learn action values and the optimal policy from archived trajectory data. The method is model-free, in that no state transition model is needed. It is more efficient than the commonly used online RL methods based on interactions with a simulator, due to batch processing and reuse of transition experiences.The batch RL method is evaluated in a large network of Shanghai, China with GPS trajectories of over 12,000 taxis. The training is conducted with two datasets: one is a synthetic dataset where state transitions are generated in a simulator with a postulated system dynamics model (Yu et al., 2019) whose parameters are derived from observed data; the other contains real-world state transitions extracted from observed taxi trajectories.The batch RL method is more computationally efficient, reducing the training time by dozens of times compared with the online Q-learning method. Its performance in terms of average profit per hour and occupancy rate is assessed in the simulator, against that of a baseline model, the random walk, and an upper bound, generated by the exact Dynamic Programming (DP) method based on the same system model of the simulator. The batch RL based on simulated and observed trajectories both outperform the random walk, and the advantage increases with the training sample size. The batch RL based on simulated trajectories achieves 95% of the performance upper bound with 30-minutes time intervals, suggesting that the model-free method is highly effective. The batch RL based on observed data achieves around 90% of the performance upper bound with 30-minute time intervals, due to the discrepancy between the training and evaluation environments, and its performance in the real world is expected to similarly good since the training and evaluation would be based on the same environment.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] DRSIR: A Deep Reinforcement Learning Approach for Routing in Software-Defined Networking
    Casas-Velasco, Daniela M.
    Rendon, Oscar Mauricio Caicedo
    da Fonseca, Nelson L. S.
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2022, 19 (04): : 4807 - 4820
  • [42] A Reinforcement Learning Based Approach for Efficient Routing in Multi-FPGA Platforms
    Farooq, Umer
    Mehrez, Habib
    Hasan, Najam Ul
    SENSORS, 2025, 25 (01)
  • [43] Routing in Reinforcement Learning Markov Chains
    Moll, Maximilian
    Weller, Dominic
    OPERATIONS RESEARCH PROCEEDINGS 2021, 2022, : 409 - 414
  • [44] A Deep Reinforcement Learning Approach to Droplet Routing for Erroneous Digital Microfluidic Biochips
    Kawakami, Tomohisa
    Shiro, Chiharu
    Nishikawa, Hiroki
    Kong, Xiangbo
    Tomiyama, Hiroyuki
    Yamashita, Shigeru
    SENSORS, 2023, 23 (21)
  • [45] A reinforcement learning approach for widest path routing in software-defined networks
    Ke, Chih-Heng
    Tu, Yi-Hao
    Ma, Yi-Wei
    ICT EXPRESS, 2023, 9 (05): : 882 - 889
  • [46] Routing Recovery for UAV Networks with Deliberate Attacks: A Reinforcement Learning based Approach
    He, Sijie
    Jia, Ziye
    Dong, Chao
    Wang, Wei
    Cao, Yilu
    Yang, Yang
    Wu, Qihui
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 952 - 957
  • [47] Congestion-Aware Routing in Dynamic IoT Networks: A Reinforcement Learning Approach
    Farag, Hossam
    Stefanovic, Cedomir
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [48] Vacant Parking Space Detection based on Task Consistency and Reinforcement Learning
    Manh-Hung Nguyen
    Chao, Tzu-Yin
    Huang, Ching-Chun
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2009 - 2016
  • [49] ESTIMATING TAXI-OUT TIMES WITH A REINFORCEMENT LEARNING ALGORITHM
    Balakrishna, Poornima
    Ganesan, Rajesh
    Sherry, Lance
    Levy, Benjamin S.
    DASC: 2008 IEEE/AIAA 27TH DIGITAL AVIONICS SYSTEMS CONFERENCE, VOLS 1 AND 2, 2008, : 664 - +
  • [50] Deep reinforcement learning based electric taxi service optimization
    Ye H.
    Tu W.
    Ye H.
    Mai K.
    Zhao T.
    Li Q.
    Tu, Wei (tuwei@szu.edu.cn), 1630, SinoMaps Press (49): : 1630 - 1639