Online Planning for Large Markov Decision Processes with Hierarchical Decomposition

被引：25

作者：

Bai, Aijun ^{[1
]}

Wu, Feng ^{[1
]}

Chen, Xiaoping ^{[1
]}

机构：

[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230026, Anhui, Peoples R China

来源：

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY | 2015年 / 6卷 / 04期

基金：

新加坡国家研究基金会; 中国国家自然科学基金;

关键词：

Algorithms; Experimentation; MDP; online planning; MAXQ-OP; RoboCup; ROBOCUP SOCCER; REINFORCEMENT; ABSTRACTION; SEARCH;

D O I：

10.1145/2717316

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Markov decision processes (MDPs) provide a rich framework for planning under uncertainty. However, exactly solving a large MDP is usually intractable due to the "curse of dimensionality"- the state space grows exponentially with the number of state variables. Online algorithms tackle this problem by avoiding computing a policy for the entire state space. On the other hand, since online algorithm has to find a near-optimal action online in almost real time, the computation time is often very limited. In the context of reinforcement learning, MAXQ is a value function decomposition method that exploits the underlying structure of the original MDP and decomposes it into a combination of smaller subproblems arranged over a task hierarchy. In this article, we present MAXQ-OP-a novel online planning algorithm for large MDPs that utilizes MAXQ hierarchical decomposition in online settings. Compared to traditional online planning algorithms, MAXQ-OP is able to reach much more deeper states in the search tree with relatively less computation time by exploiting MAXQ hierarchical decomposition online. We empirically evaluate our algorithm in the standard Taxi domain-a common benchmark for MDPs-to show the effectiveness of our approach. We have also conducted a long-term case study in a highly complex simulated soccer domain and developed a team named WrightEagle that has won five world champions and five runners-up in the recent 10 years of RoboCup Soccer Simulation 2D annual competitions. The results in the RoboCup domain confirm the scalability of MAXQ-OP to very large domains.

引用

页数：28

共 50 条

[1] Relativized hierarchical decomposition of Markov decision processes
Ravindran, B.
DECISION MAKING: NEURAL AND BEHAVIOURAL APPROACHES, 2013, 202 : 465 - 488
[2] Planning using hierarchical constrained Markov decision processes
Seyedshams Feyzabadi
Stefano Carpin
Autonomous Robots, 2017, 41 : 1589 - 1607
[3] Planning using hierarchical constrained Markov decision processes
Feyzabadi, Seyedshams
Carpin, Stefano
AUTONOMOUS ROBOTS, 2017, 41 (08) : 1589 - 1607
[4] Approximate planning and verification for large Markov decision processes
Lassaigne, Richard
Peyronnet, Sylvain
INTERNATIONAL JOURNAL ON SOFTWARE TOOLS FOR TECHNOLOGY TRANSFER, 2015, 17 (04) : 457 - 467
[5] Approximate planning and verification for large Markov decision processes
Richard Lassaigne
Sylvain Peyronnet
International Journal on Software Tools for Technology Transfer, 2015, 17 : 457 - 467
[6] Simple Regret Optimization in Online Planning for Markov Decision Processes
Feldman, Zohar
Domshlak, Carmel
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2014, 51 : 165 - 205
[7] Global path planning for AUV based on hierarchical Markov decision processes
Hong, Ye
Wang, Hong-Jian
Bian, Xin-Qian
Xitong Fangzhen Xuebao / Journal of System Simulation, 2008, 20 (09): : 2361 - 2363
[8] Online Markov Decision Processes
Even-Dar, Eyal
Kakade, Sham M.
Mansour, Yishay
MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (03) : 726 - 736
[9] Robust motion planning using Markov Decision Processes and quadtree decomposition
Burlet, J
Aycard, O
Fraichard, T
2004 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1- 5, PROCEEDINGS, 2004, : 2820 - 2825
[10] Prioritized goal decomposition of Markov decision processes: Toward a synthesis of classical and decision theoretic planning
Boutilier, C
Brafman, RI
Geib, C
IJCAI-97 - PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 AND 2, 1997, : 1156 - 1162

← 1 2 3 4 5 →