Planning and Learning with Adaptive Lookahead

被引：0

作者：

Rosenberg, Aviv ^{[1
]}

Hallak, Assaf ^{[2
]}

Mannor, Shie ^{[2
,3
]}

Chechik, Gal ^{[2
,4
]}

Dalal, Gal ^{[2
]}

机构：

[1] Amazon Sci, Seattle, WA USA

[2] Nvidia Res, Santa Clara, CA USA

[3] Technion, Haifa, Israel

[4] Bar Ilan Univ, Ramat Gan, Israel

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8 | 2023年

关键词：

SHOGI; CHESS; GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Some of the most powerful reinforcement learning frameworks use planning for action selection. Interestingly, their planning horizon is either fixed or determined arbitrarily by the state visitation history. Here, we expand beyond the naive fixed horizon and propose a theoretically justified strategy for adaptive selection of the planning horizon as a function of the state-dependent value estimate. We propose two variants for lookahead selection and analyze the trade-off between iteration count and computational complexity per iteration. We then devise a corresponding deep Q-network algorithm with an adaptive tree search horizon. We separate the value estimation per depth to compensate for the off-policy discrepancy between depths. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.

引用

页码：9606 / 9613

页数：8

共 50 条

[41] Adaptive Selection of Informative Path Planning Strategies via Reinforcement Learning
Choi, Taeyeong
Cielniak, Grzegorz
10TH EUROPEAN CONFERENCE ON MOBILE ROBOTS (ECMR 2021), 2021,
[42] Deep Reinforcement Learning With Dynamic Graphs for Adaptive Informative Path Planning
Vashisth, Apoorva
Rueckin, Julius
Magistri, Federico
Stachniss, Cyrill
Popovic, Marija
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (09): : 7747 - 7754
[43] An adaptive gain parameters algorithm for path planning based on reinforcement learning
Yu, JL
Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 3557 - 3562
[44] Adaptive acquisition planning for visual inspection in remanufacturing using reinforcement learning
Kaiser, Jan-Philipp
Gaebele, Jonas
Koch, Dominik
Schmid, Jonas
Stamer, Florian
Lanza, Gisela
JOURNAL OF INTELLIGENT MANUFACTURING, 2024,
[45] Transactional Distance and Adaptive Learning, Planning for the Future of Higher Education.
Safford, Kimberly
Iniesto, Francesco
Stranach, Matthew
Atkinson, Simon Paul
Foley, Pam
JOURNAL OF INTERACTIVE MEDIA IN EDUCATION, 2019, (01):
[46] LOOKAHEAD NETWORK
GOYAL, A
IEEE TRANSACTIONS ON COMMUNICATIONS, 1985, 33 (11) : 1160 - 1170
[47] Lookahead and latent learning in a simple accuracy-based classifier system
Bull, L
PARALLEL PROBLEM SOLVING FROM NATURE - PPSN VIII, 2004, 3242 : 1042 - 1050
[48] Learning to Evaluate Chess Positions with Deep Neural Networks and Limited Lookahead
Sabatelli, Matthia
Bidoia, Francesco
Codreanu, Valeriu
Wiering, Marco
PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM 2018), 2018, : 276 - 283
[49] ADAPTIVE PLANNING
ALTERMAN, R
COGNITIVE SCIENCE, 1988, 12 (03) : 393 - 421
[50] Reinforcement learning method based on sample regularization and adaptive learning rate for AGV path planning
Nie, Jun
Zhang, Guihua
Lu, Xiao
Wang, Haixia
Sheng, Chunyang
Sun, Lijie
NEUROCOMPUTING, 2025, 614

← 1 2 3 4 5 →