Online Planning for Large Markov Decision Processes with Hierarchical Decomposition

被引：25

作者：

Bai, Aijun ^{[1
]}

Wu, Feng ^{[1
]}

Chen, Xiaoping ^{[1
]}

机构：

[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230026, Anhui, Peoples R China

来源：

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY | 2015年 / 6卷 / 04期

基金：

新加坡国家研究基金会; 中国国家自然科学基金;

关键词：

Algorithms; Experimentation; MDP; online planning; MAXQ-OP; RoboCup; ROBOCUP SOCCER; REINFORCEMENT; ABSTRACTION; SEARCH;

D O I：

10.1145/2717316

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Markov decision processes (MDPs) provide a rich framework for planning under uncertainty. However, exactly solving a large MDP is usually intractable due to the "curse of dimensionality"- the state space grows exponentially with the number of state variables. Online algorithms tackle this problem by avoiding computing a policy for the entire state space. On the other hand, since online algorithm has to find a near-optimal action online in almost real time, the computation time is often very limited. In the context of reinforcement learning, MAXQ is a value function decomposition method that exploits the underlying structure of the original MDP and decomposes it into a combination of smaller subproblems arranged over a task hierarchy. In this article, we present MAXQ-OP-a novel online planning algorithm for large MDPs that utilizes MAXQ hierarchical decomposition in online settings. Compared to traditional online planning algorithms, MAXQ-OP is able to reach much more deeper states in the search tree with relatively less computation time by exploiting MAXQ hierarchical decomposition online. We empirically evaluate our algorithm in the standard Taxi domain-a common benchmark for MDPs-to show the effectiveness of our approach. We have also conducted a long-term case study in a highly complex simulated soccer domain and developed a team named WrightEagle that has won five world champions and five runners-up in the recent 10 years of RoboCup Soccer Simulation 2D annual competitions. The results in the RoboCup domain confirm the scalability of MAXQ-OP to very large domains.

引用

页数：28

共 50 条

[41] Online Learning with Implicit Exploration in Episodic Markov Decision Processes
Ghasemi, Mahsa
Hashemi, Abolfazl
Vikalo, Haris
Topcu, Ufuk
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1953 - 1958
[42] Path planning for palletizing robot using Hierarchical Markov Decision Process
Liu, Jiu-Fu, 1600, Chinese Mechanical Engineering Society (35):
[43] Path Planning for Palletizing Robot Using Hierarchical Markov Decision Process
Liu, Jiu-Fu
Gao, Lei
Sun, Yan
Zhou, Jian-Yong
Liu, Wen-Liang
Yang, Zhong
Wu, Shu-Yan
JOURNAL OF THE CHINESE SOCIETY OF MECHANICAL ENGINEERS, 2014, 35 (06): : 477 - 483
[44] Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming
Nitti, Davide
Belle, Vaishak
de Raedt, Luc
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2015, PT II, 2015, 9285 : 327 - 342
[45] Robust Adaptive Markov Decision Processes PLANNING WITH MODEL UNCERTAINTY
Bertuccelli, Luca F.
Wu, Albert
How, Jonathan P.
IEEE CONTROL SYSTEMS MAGAZINE, 2012, 32 (05): : 96 - 109
[46] Optimistic Planning for Belief-Augmented Markov Decision Processes
Fonteneau, Raphael
Busoniu, Lucian
Munos, Remi
PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2013, : 77 - 84
[47] Optimistic planning in Markov decision processes using a generative model
Szorenyi, Balazs
Kedenburg, Gunnar
Munos, Remi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[48] Learning and Planning in Average-Reward Markov Decision Processes
Wan, Yi
Naik, Abhishek
Sutton, Richard S.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7665 - 7676
[49] Planning in entropy-regularized Markov decision processes and games
Grill, Jean-Bastien
Domingues, Omar D.
Menard, Pierre
Munos, Remi
Valko, Michal
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[50] Ground Delay Program Planning Using Markov Decision Processes
Cox, Jonathan
Kochenderfer, Mykel J.
JOURNAL OF AEROSPACE INFORMATION SYSTEMS, 2016, 13 (03): : 134 - 142

← 1 2 3 4 5 →