Monte Carlo Hierarchical Model Learning

被引:0
|
作者
Menashe, Jacob [1 ]
Stone, Peter [1 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
关键词
Single and multi-agent learning techniques; Reinforcement Learning; Factored Domains; Model Learning; Hierarchical Skill Learning; Monte Carlo Methods;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning (RL) is a well-established paradigm for enabling autonomous agents to learn from experience. To enable RL to scale to any but the smallest domains, it is necessary to make use of abstraction and generalization of the state-action space, for example with a factored representation. However, to make effective use of such a representation, it is necessary to determine which state variables are relevant in which situations. In this work, we introduce T-UCT, a novel model-based RL approach for learning and exploiting the dynamics of structured hierarchical environments. When learning the dynamics while acting, a partial or inaccurate model may do more harm than good. T-UCT uses graph-based planning and Monte Carlo simulations to exploit models that may be incomplete or inaccurate, allowing it to both maximize cumulative rewards and ignore trajectories that are unlikely to succeed. T-UCT incorporates new experiences in the form of more accurate plans that span a greater area of the state space. T-UCT is fully implemented and compared empirically against B-VISA, the best known prior approach to the same problem. We show that T-UCT learns hierarchical models with fewer samples than B-VISA and that this effect is magnified at deeper levels of hierarchical complexity.
引用
收藏
页码:1985 / 1986
页数:2
相关论文
共 50 条
  • [21] Learning Hamiltonian Monte Carlo in R
    Thomas, Samuel
    Tu, Wanzhu
    AMERICAN STATISTICIAN, 2021, 75 (04): : 403 - 413
  • [22] Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models
    Dellaportas, P
    Forster, JJ
    BIOMETRIKA, 1999, 86 (03) : 615 - 633
  • [23] Markov Chain Monte Carlo model composition search strategy for quantitative trait loci in a Bayesian hierarchical model
    Simmons, Susan J.
    Fang, Fang
    Fang, Qijun
    Ricanek, Karl
    World Academy of Science, Engineering and Technology, 2010, 63 : 58 - 61
  • [24] A Game Model for Gomoku Based on Deep Learning and Monte Carlo Tree Search
    Li, Xiali
    He, Shuai
    Wu, Licheng
    Chen, Daiyao
    Zhao, Yue
    PROCEEDINGS OF 2019 CHINESE INTELLIGENT AUTOMATION CONFERENCE, 2020, 586 : 88 - 97
  • [25] Continuous Time Quantum Monte Carlo in Combination with Machine Learning on the Hubbard Model
    Hunpyo Lee
    Journal of the Korean Physical Society, 2019, 75 : 841 - 844
  • [26] Learning model-free robot control by a Monte Carlo EM algorithm
    Nikos Vlassis
    Marc Toussaint
    Georgios Kontes
    Savas Piperidis
    Autonomous Robots, 2009, 27 : 123 - 130
  • [27] IPA MODEL OF NEURAL NETWORK AND ITS MONTE-CARLO LEARNING ALGORITHM
    MU, G
    LU, M
    ZHAN, Y
    OPTIK, 1991, 89 (01): : 11 - 14
  • [28] An analysis of mode transition model form play to learning with Monte Carlo Simulation
    Kotani, Takuya
    Kaburagi, Makoto
    2006 7TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY BASED HIGHER EDUCATION AND TRAINING, VOLS 1 AND 2, 2006, : 626 - 630
  • [29] Learning model-free robot control by a Monte Carlo EM algorithm
    Vlassis, Nikos
    Toussaint, Marc
    Kontes, Georgios
    Piperidis, Savas
    AUTONOMOUS ROBOTS, 2009, 27 (02) : 123 - 130
  • [30] Continuous Time Quantum Monte Carlo in Combination with Machine Learning on the Hubbard Model
    Lee, Hunpyo
    JOURNAL OF THE KOREAN PHYSICAL SOCIETY, 2019, 75 (10) : 841 - 844