Planning and Learning with Adaptive Lookahead

被引:0
|
作者
Rosenberg, Aviv [1 ]
Hallak, Assaf [2 ]
Mannor, Shie [2 ,3 ]
Chechik, Gal [2 ,4 ]
Dalal, Gal [2 ]
机构
[1] Amazon Sci, Seattle, WA USA
[2] Nvidia Res, Santa Clara, CA USA
[3] Technion, Haifa, Israel
[4] Bar Ilan Univ, Ramat Gan, Israel
关键词
SHOGI; CHESS; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Some of the most powerful reinforcement learning frameworks use planning for action selection. Interestingly, their planning horizon is either fixed or determined arbitrarily by the state visitation history. Here, we expand beyond the naive fixed horizon and propose a theoretically justified strategy for adaptive selection of the planning horizon as a function of the state-dependent value estimate. We propose two variants for lookahead selection and analyze the trade-off between iteration count and computational complexity per iteration. We then devise a corresponding deep Q-network algorithm with an adaptive tree search horizon. We separate the value estimation per depth to compensate for the off-policy discrepancy between depths. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.
引用
收藏
页码:9606 / 9613
页数:8
相关论文
共 50 条
  • [21] Adaptive Trajectory Learning With Obstacle Awareness for Motion Planning
    Zheng, Huaihang
    Tan, Zimeng
    Wang, Junzheng
    Tavakoli, Mahdi
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (04): : 3884 - 3891
  • [22] Inverse Learning for Human-Adaptive Motion Planning
    Menner, Marcel
    Berntorp, Karl
    Zeilinger, Melanie N.
    Di Cairano, Stefano
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 809 - 815
  • [23] Sequential Monte Carlo with Adaptive Lookahead Support for Improved Importance Sampling
    Choppala, Praveen B.
    FLUCTUATION AND NOISE LETTERS, 2024, 23 (05):
  • [24] Lookahead search for lossy context-based adaptive entropy coding
    Singh, R
    Ortega, A
    2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 845 - 848
  • [25] Learning-based Adaptive Sampling for Manipulator Motion Planning
    Gaebert, Carl
    Thomas, Ulrike
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2022, : 715 - 721
  • [26] Adaptive Mission Abort Planning Integrating Bayesian Parameter Learning
    Ma, Yuhan
    Wei, Fanping
    Ma, Xiaobing
    Qiu, Qingan
    Yang, Li
    MATHEMATICS, 2024, 12 (16)
  • [27] Transactional Distance and Adaptive Learning: Planning for the Future of Higher Education
    Saykili, Abdullah
    INTERNATIONAL REVIEW OF RESEARCH IN OPEN AND DISTRIBUTED LEARNING, 2019, 20 (03): : 252 - 254
  • [28] Adaptive UAV Swarm Mission Planning by Temporal Difference Learning
    Gopalakrishnan, Shreevanth Krishnaa
    Al-Rubaye, Saba
    Inalhan, Gokhan
    2021 IEEE/AIAA 40TH DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), 2021,
  • [29] On-the-Fly Adaptive Planning for Game-Based Learning
    Hulpus, Ioana
    Fradinho, Manuel
    Hayes, Conor
    CASE-BASED REASONING RESEARCH AND DEVELOPMENT, 18TH INTERNATIONAL CONFERENCE ON CASE-BASED REASONING, ICCBR 2010, 2010, 6176 : 375 - +
  • [30] Adaptive grasping: Machine learning applied to planning of power grasps
    Starr, GP
    Liu, YB
    ROBOTICS 2000, PROCEEDINGS, 2000, : 241 - 247