Online Planning for Large Markov Decision Processes with Hierarchical Decomposition

被引:25
|
作者
Bai, Aijun [1 ]
Wu, Feng [1 ]
Chen, Xiaoping [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230026, Anhui, Peoples R China
基金
新加坡国家研究基金会; 中国国家自然科学基金;
关键词
Algorithms; Experimentation; MDP; online planning; MAXQ-OP; RoboCup; ROBOCUP SOCCER; REINFORCEMENT; ABSTRACTION; SEARCH;
D O I
10.1145/2717316
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Markov decision processes (MDPs) provide a rich framework for planning under uncertainty. However, exactly solving a large MDP is usually intractable due to the "curse of dimensionality"- the state space grows exponentially with the number of state variables. Online algorithms tackle this problem by avoiding computing a policy for the entire state space. On the other hand, since online algorithm has to find a near-optimal action online in almost real time, the computation time is often very limited. In the context of reinforcement learning, MAXQ is a value function decomposition method that exploits the underlying structure of the original MDP and decomposes it into a combination of smaller subproblems arranged over a task hierarchy. In this article, we present MAXQ-OP-a novel online planning algorithm for large MDPs that utilizes MAXQ hierarchical decomposition in online settings. Compared to traditional online planning algorithms, MAXQ-OP is able to reach much more deeper states in the search tree with relatively less computation time by exploiting MAXQ hierarchical decomposition online. We empirically evaluate our algorithm in the standard Taxi domain-a common benchmark for MDPs-to show the effectiveness of our approach. We have also conducted a long-term case study in a highly complex simulated soccer domain and developed a team named WrightEagle that has won five world champions and five runners-up in the recent 10 years of RoboCup Soccer Simulation 2D annual competitions. The results in the RoboCup domain confirm the scalability of MAXQ-OP to very large domains.
引用
收藏
页数:28
相关论文
共 50 条
  • [41] Online Learning with Implicit Exploration in Episodic Markov Decision Processes
    Ghasemi, Mahsa
    Hashemi, Abolfazl
    Vikalo, Haris
    Topcu, Ufuk
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1953 - 1958
  • [42] Path planning for palletizing robot using Hierarchical Markov Decision Process
    Liu, Jiu-Fu, 1600, Chinese Mechanical Engineering Society (35):
  • [43] Path Planning for Palletizing Robot Using Hierarchical Markov Decision Process
    Liu, Jiu-Fu
    Gao, Lei
    Sun, Yan
    Zhou, Jian-Yong
    Liu, Wen-Liang
    Yang, Zhong
    Wu, Shu-Yan
    JOURNAL OF THE CHINESE SOCIETY OF MECHANICAL ENGINEERS, 2014, 35 (06): : 477 - 483
  • [44] Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming
    Nitti, Davide
    Belle, Vaishak
    de Raedt, Luc
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2015, PT II, 2015, 9285 : 327 - 342
  • [45] Robust Adaptive Markov Decision Processes PLANNING WITH MODEL UNCERTAINTY
    Bertuccelli, Luca F.
    Wu, Albert
    How, Jonathan P.
    IEEE CONTROL SYSTEMS MAGAZINE, 2012, 32 (05): : 96 - 109
  • [46] Optimistic Planning for Belief-Augmented Markov Decision Processes
    Fonteneau, Raphael
    Busoniu, Lucian
    Munos, Remi
    PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2013, : 77 - 84
  • [47] Optimistic planning in Markov decision processes using a generative model
    Szorenyi, Balazs
    Kedenburg, Gunnar
    Munos, Remi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [48] Learning and Planning in Average-Reward Markov Decision Processes
    Wan, Yi
    Naik, Abhishek
    Sutton, Richard S.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7665 - 7676
  • [49] Planning in entropy-regularized Markov decision processes and games
    Grill, Jean-Bastien
    Domingues, Omar D.
    Menard, Pierre
    Munos, Remi
    Valko, Michal
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [50] Ground Delay Program Planning Using Markov Decision Processes
    Cox, Jonathan
    Kochenderfer, Mykel J.
    JOURNAL OF AEROSPACE INFORMATION SYSTEMS, 2016, 13 (03): : 134 - 142