Planning and Learning with Adaptive Lookahead

被引:0
|
作者
Rosenberg, Aviv [1 ]
Hallak, Assaf [2 ]
Mannor, Shie [2 ,3 ]
Chechik, Gal [2 ,4 ]
Dalal, Gal [2 ]
机构
[1] Amazon Sci, Seattle, WA USA
[2] Nvidia Res, Santa Clara, CA USA
[3] Technion, Haifa, Israel
[4] Bar Ilan Univ, Ramat Gan, Israel
关键词
SHOGI; CHESS; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Some of the most powerful reinforcement learning frameworks use planning for action selection. Interestingly, their planning horizon is either fixed or determined arbitrarily by the state visitation history. Here, we expand beyond the naive fixed horizon and propose a theoretically justified strategy for adaptive selection of the planning horizon as a function of the state-dependent value estimate. We propose two variants for lookahead selection and analyze the trade-off between iteration count and computational complexity per iteration. We then devise a corresponding deep Q-network algorithm with an adaptive tree search horizon. We separate the value estimation per depth to compensate for the off-policy discrepancy between depths. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.
引用
收藏
页码:9606 / 9613
页数:8
相关论文
共 50 条
  • [41] Adaptive Selection of Informative Path Planning Strategies via Reinforcement Learning
    Choi, Taeyeong
    Cielniak, Grzegorz
    10TH EUROPEAN CONFERENCE ON MOBILE ROBOTS (ECMR 2021), 2021,
  • [42] Deep Reinforcement Learning With Dynamic Graphs for Adaptive Informative Path Planning
    Vashisth, Apoorva
    Rueckin, Julius
    Magistri, Federico
    Stachniss, Cyrill
    Popovic, Marija
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (09): : 7747 - 7754
  • [43] An adaptive gain parameters algorithm for path planning based on reinforcement learning
    Yu, JL
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 3557 - 3562
  • [44] Adaptive acquisition planning for visual inspection in remanufacturing using reinforcement learning
    Kaiser, Jan-Philipp
    Gaebele, Jonas
    Koch, Dominik
    Schmid, Jonas
    Stamer, Florian
    Lanza, Gisela
    JOURNAL OF INTELLIGENT MANUFACTURING, 2024,
  • [45] Transactional Distance and Adaptive Learning, Planning for the Future of Higher Education.
    Safford, Kimberly
    Iniesto, Francesco
    Stranach, Matthew
    Atkinson, Simon Paul
    Foley, Pam
    JOURNAL OF INTERACTIVE MEDIA IN EDUCATION, 2019, (01):
  • [46] LOOKAHEAD NETWORK
    GOYAL, A
    IEEE TRANSACTIONS ON COMMUNICATIONS, 1985, 33 (11) : 1160 - 1170
  • [47] Lookahead and latent learning in a simple accuracy-based classifier system
    Bull, L
    PARALLEL PROBLEM SOLVING FROM NATURE - PPSN VIII, 2004, 3242 : 1042 - 1050
  • [48] Learning to Evaluate Chess Positions with Deep Neural Networks and Limited Lookahead
    Sabatelli, Matthia
    Bidoia, Francesco
    Codreanu, Valeriu
    Wiering, Marco
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM 2018), 2018, : 276 - 283
  • [49] ADAPTIVE PLANNING
    ALTERMAN, R
    COGNITIVE SCIENCE, 1988, 12 (03) : 393 - 421
  • [50] Reinforcement learning method based on sample regularization and adaptive learning rate for AGV path planning
    Nie, Jun
    Zhang, Guihua
    Lu, Xiao
    Wang, Haixia
    Sheng, Chunyang
    Sun, Lijie
    NEUROCOMPUTING, 2025, 614