Planning and Learning with Adaptive Lookahead

被引:0
|
作者
Rosenberg, Aviv [1 ]
Hallak, Assaf [2 ]
Mannor, Shie [2 ,3 ]
Chechik, Gal [2 ,4 ]
Dalal, Gal [2 ]
机构
[1] Amazon Sci, Seattle, WA USA
[2] Nvidia Res, Santa Clara, CA USA
[3] Technion, Haifa, Israel
[4] Bar Ilan Univ, Ramat Gan, Israel
关键词
SHOGI; CHESS; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Some of the most powerful reinforcement learning frameworks use planning for action selection. Interestingly, their planning horizon is either fixed or determined arbitrarily by the state visitation history. Here, we expand beyond the naive fixed horizon and propose a theoretically justified strategy for adaptive selection of the planning horizon as a function of the state-dependent value estimate. We propose two variants for lookahead selection and analyze the trade-off between iteration count and computational complexity per iteration. We then devise a corresponding deep Q-network algorithm with an adaptive tree search horizon. We separate the value estimation per depth to compensate for the off-policy discrepancy between depths. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.
引用
收藏
页码:9606 / 9613
页数:8
相关论文
共 50 条
  • [31] Adaptive sensor-planning algorithm with Q-learning
    Maeda, M
    Kato, N
    Kashimura, H
    2004 IEEE CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2004, : 966 - 969
  • [32] Transactional Distance and Adaptive Learning: Planning for the Future of Higher Education
    Nichols, Mark
    OPEN PRAXIS, 2020, 12 (01): : 155 - 157
  • [33] Transactional Distance and Adaptive Learning: Planning for the Future of Higher Education
    Ko, Susan S.
    ONLINE LEARNING, 2018, 22 (02): : 301 - 303
  • [34] Learning-based methods for adaptive informative path planning
    Popovic, Marija
    Ott, Joshua
    Ruckin, Julius
    Kochenderfer, Mykel J.
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2024, 179
  • [35] Channel Pruning via Lookahead Search Guided Reinforcement Learning
    Wang, Zi
    Li, Chengcheng
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 3513 - 3524
  • [36] An adaptive large neighbourhood search metaheuristic for hourly learning activity planning in personalised learning
    Wouda, Niels A.
    Aslan, Ayse
    Vis, Iris F. A.
    COMPUTERS & OPERATIONS RESEARCH, 2023, 151
  • [37] Adaptive local learning in sampling based motion planning for protein folding
    Ekenna, Chinwe
    Thomas, Shawna
    Amato, Nancy M.
    BMC SYSTEMS BIOLOGY, 2016, 10
  • [38] Adaptive Local Learning in Sampling Based Motion Planning for Protein Folding
    Ekenna, Chin We
    Thomas, Shawna
    Amato, Nancy M.
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 61 - 68
  • [39] Adaptive speed planning for Unmanned Vehicle Based on Deep Reinforcement Learning
    Liu, Hao
    Shen, Yi
    Zhou, Wenjing
    Zou, Yuelin
    Zhou, Chang
    He, Shuyao
    2024 5TH INTERNATIONAL CONFERENCE ON MECHATRONICS TECHNOLOGY AND INTELLIGENT MANUFACTURING, ICMTIM 2024, 2024, : 642 - 645
  • [40] Study on adaptive path planning for mobile robot based on Q learning
    Li, Caihong
    Li, Yibin
    Zhang, Zijian
    Song, Rui
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 3939 - +