A batch reinforcement learning approach to vacant taxi routing

被引：7

作者：

Yu, Xinlian ^{[1
]}

Gao, Song ^{[2
]}

机构：

[1] Southeast Univ, Sch Transportat, Nanjing, Peoples R China

[2] Univ Massachusetts, Dept Civil & Environm Engn, Amherst, MA USA

来源：

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES | 2022年 / 139卷

关键词：

Vacant taxi routing; Markov decision process; Batch reinforcement learning; Fitted Q-iteration; MARKOV DECISION-PROCESS; GO; FRAMEWORK; NETWORKS; FLEET; MODEL; GAME;

D O I：

10.1016/j.trc.2022.103640

中图分类号：

U [交通运输];

学科分类号：

08 ; 0823 ;

摘要：

The optimal routing of a single vacant taxi is formulated as a Markov Decision Process (MDP) problem to account for profit maximization over a full working period in a transportation network. A batch offline reinforcement learning (RL) method is proposed to learn action values and the optimal policy from archived trajectory data. The method is model-free, in that no state transition model is needed. It is more efficient than the commonly used online RL methods based on interactions with a simulator, due to batch processing and reuse of transition experiences.The batch RL method is evaluated in a large network of Shanghai, China with GPS trajectories of over 12,000 taxis. The training is conducted with two datasets: one is a synthetic dataset where state transitions are generated in a simulator with a postulated system dynamics model (Yu et al., 2019) whose parameters are derived from observed data; the other contains real-world state transitions extracted from observed taxi trajectories.The batch RL method is more computationally efficient, reducing the training time by dozens of times compared with the online Q-learning method. Its performance in terms of average profit per hour and occupancy rate is assessed in the simulator, against that of a baseline model, the random walk, and an upper bound, generated by the exact Dynamic Programming (DP) method based on the same system model of the simulator. The batch RL based on simulated and observed trajectories both outperform the random walk, and the advantage increases with the training sample size. The batch RL based on simulated trajectories achieves 95% of the performance upper bound with 30-minutes time intervals, suggesting that the model-free method is highly effective. The batch RL based on observed data achieves around 90% of the performance upper bound with 30-minute time intervals, due to the discrepancy between the training and evaluation environments, and its performance in the real world is expected to similarly good since the training and evaluation would be based on the same environment.

引用

页数：19

共 50 条

[31] A New Model for the Movement Pattern of Vacant Taxi
Guang, Yingnan
Yang, Min
Zhang, Xuedan
2015 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2015, : 1050 - 1053
[32] Batch Reinforcement Learning from Crowds
Zhang, Guoxi
Kashima, Hisashi
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 38 - 51
[33] Batch Reinforcement Learning with Hyperparameter Gradients
Lee, Byung-Jun
Lee, Jongmin
Vrancx, Peter
Kim, Dongho
Kim, Kee-Eung
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[34] Reinforcement learning for batch bioprocess optimization
Petsagkourakis, P.
Sandoval, I. O.
Bradford, E.
Zhang, D.
del Rio-Chanona, E. A.
COMPUTERS & CHEMICAL ENGINEERING, 2020, 133
[35] Small batch deep reinforcement learning
Obando-Ceron, Johan
Bellemare, Marc G.
Castro, Pablo Samuel
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[36] Batch Prioritization in Multigoal Reinforcement Learning
Vecchietti, Luiz Felipe
Kim, Taeyoung
Choi, Kyujin
Hong, Junhee
Har, Dongsoo
IEEE ACCESS, 2020, 8 : 137449 - 137461
[37] Batch reinforcement learning with state importance
Li, LH
Bulitko, V
Greiner, R
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 : 566 - 568
[38] Embedding active learning in batch-to-batch optimization using reinforcement learning
Byun, Ha-Eun
Kim, Boeun
Lee, Jay H.
AUTOMATICA, 2023, 157
[39] Multiagent Reinforcement Learning-Based Taxi Predispatching Model to Balance Taxi Supply and Demand
Yang, Yongjian
Wang, Xintao
Xu, Yuanbo
Huang, Qiuyang
JOURNAL OF ADVANCED TRANSPORTATION, 2020, 2020
[40] Reinforcement Learning for Adaptive Network Routing
Desai, Rahul
Patil, B. P.
2014 INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2014, : 815 - 818

← 1 2 3 4 5 →