Planning and Learning with Adaptive Lookahead

被引：0

作者：

Rosenberg, Aviv ^{[1
]}

Hallak, Assaf ^{[2
]}

Mannor, Shie ^{[2
,3
]}

Chechik, Gal ^{[2
,4
]}

Dalal, Gal ^{[2
]}

机构：

[1] Amazon Sci, Seattle, WA USA

[2] Nvidia Res, Santa Clara, CA USA

[3] Technion, Haifa, Israel

[4] Bar Ilan Univ, Ramat Gan, Israel

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8 | 2023年

关键词：

SHOGI; CHESS; GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Some of the most powerful reinforcement learning frameworks use planning for action selection. Interestingly, their planning horizon is either fixed or determined arbitrarily by the state visitation history. Here, we expand beyond the naive fixed horizon and propose a theoretically justified strategy for adaptive selection of the planning horizon as a function of the state-dependent value estimate. We propose two variants for lookahead selection and analyze the trade-off between iteration count and computational complexity per iteration. We then devise a corresponding deep Q-network algorithm with an adaptive tree search horizon. We separate the value estimation per depth to compensate for the off-policy discrepancy between depths. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.

引用

页码：9606 / 9613

页数：8

共 50 条

[1] Online Planning with Lookahead Policies
Efroni, Yonathan
Ghavamzadeh, Mohammad
Mannor, Shie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[2] Planning, history and present of the lookahead to the future
van Laak, Dirk
GESCHICHTE UND GESELLSCHAFT, 2008, 34 (03) : 305 - 326
[3] A neuromorphic model of spatial lookahead planning
Ivey, Richard
Bullock, Daniel
Grossberg, Stephen
NEURAL NETWORKS, 2011, 24 (03) : 257 - 266
[4] Adaptive lookahead for answer set computation
Liu, Guohua
You, Jia-Huai
19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL II, PROCEEDINGS, 2007, : 230 - +
[5] Rethinking lookahead planning to optimize construction workflow
Hamzeh, Farook (Farook.Hamzeh@aub.edu.lb), 2012, Lean Construction Institute (2012):
[6] Information-Lookahead Planning for AUV Mapping
Saigol, Zeyn A.
Dearden, Richard W.
Wyatt, Jeremy L.
Murton, Bramley J.
21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1831 - 1836
[7] On lookahead heuristics in decision tree learning
Elomaa, T
Malinen, T
FOUNDATIONS OF INTELLIGENT SYSTEMS, 2003, 2871 : 445 - 453
[8] 'Learning by doing': adaptive planning as a strategy to address uncertainty in planning
Kato, Sadahisa
Ahern, Jack
JOURNAL OF ENVIRONMENTAL PLANNING AND MANAGEMENT, 2008, 51 (04) : 543 - 559
[9] Understanding the planner's role in lookahead construction planning
Abou-Ibrahim, Hisham
Hamzeh, Farook
Zankoul, Emile
Lindhard, Soren Munch
Rizk, Lynn
PRODUCTION PLANNING & CONTROL, 2019, 30 (04) : 271 - 284
[10] USING SIMULATION TO STUDY THE IMPACT OF IMROVING LOOKAHEAD PLANNING ON THE RELIABILITY OF PRODUCTION PLANNING
Hamzeh, Farook
Langerud, Brandon
PROCEEDINGS OF THE 2011 WINTER SIMULATION CONFERENCE (WSC), 2011, : 3431 - 3442

← 1 2 3 4 5 →