Randomized Exploration for Reinforcement Learning with General Value Function Approximation

被引：0

作者：

Ishfaq, Haque ^{[1
,2
]}

Cui, Qiwen ^{[3
]}

Viet Nguyen ^{[1
,2
]}

Ayoub, Alex ^{[4
,5
]}

Yang, Zhuoran ^{[6
]}

Wang, Zhaoran ^{[7
]}

Precup, Doina ^{[1
,2
,8
]}

Yang, Lin F. ^{[9
]}

机构：

[1] Mila, Montreal, PQ, Canada

[2] McGill Univ, Sch Comp Sci, Montreal, PQ, Canada

[3] Peking Univ, Sch Math Sci, Beijing, Peoples R China

[4] Univ Alberta, Amii, Edmonton, AB, Canada

[5] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada

[6] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA

[7] Northwestern Univ, Ind Engn Management Sci, Evanston, IL 60208 USA

[8] DeepMind, Montreal, PQ, Canada

[9] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle. Unlike existing upper-confidence-bound (UCB) based approaches, which are often computationally intractable, our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. To attain optimistic value function estimation without resorting to a UCB-style bonus, we introduce an optimistic reward sampling procedure. When the value functions can be represented by a function class F, our algorithm achieves a worst-case regret bound of (O) over tilde (poly(d(E)H)root T) where T is the time elapsed, d(E) is the planning horizon and d E is the eluder dimension of In the linear setting, our algorithm reduces to LSVI-PHE, a variant of RLSVI, that enjoys an (O) over tilde root d(3)H(3)T) regret. We complement the theory with an empirical evaluation across known difficult exploration tasks.

引用

页数：10

共 50 条

[21] The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation
Winnicki, Anna
Lubars, Joseph
Livesay, Michael
Srikant, R.
OPERATIONS RESEARCH, 2025, 73 (01)
[22] Adaptive importance sampling for value function approximation in off-policy reinforcement learning
Hachiya, Hirotaka
Akiyama, Takayuki
Sugiayma, Masashi
Peters, Jan
NEURAL NETWORKS, 2009, 22 (10) : 1399 - 1410
[23] Ramp Metering for a Distant Downstream Bottleneck Using Reinforcement Learning with Value Function Approximation
Zhou, Yue
Ozbay, Kaan
Kachroo, Pushkin
Zuo, Fan
JOURNAL OF ADVANCED TRANSPORTATION, 2020, 2020 (2020)
[24] A Clustering-Based Graph Laplacian Framework for Value Function Approximation in Reinforcement Learning
Xu, Xin
Huang, Zhenhua
Graves, Daniel
Pedrycz, Witold
IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (12) : 2613 - 2625
[25] Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
Salles Barreto, Andre da Motta
Anderson, Charles W.
ARTIFICIAL INTELLIGENCE, 2008, 172 (4-5) : 454 - 482
[26] Multiagent reinforcement learning using function approximation
Abul, O
Polat, F
Alhajj, R
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2000, 30 (04): : 485 - 497
[27] Resilient Multiagent Reinforcement Learning With Function Approximation
Ye, Lintao
Figura, Martin
Lin, Yixuan
Pal, Mainak
Das, Pranoy
Liu, Ji
Gupta, Vijay
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (12) : 8497 - 8512
[28] Ensemble Methods for Reinforcement Learning with Function Approximation
Fausser, Stefan
Schwenker, Friedhelm
MULTIPLE CLASSIFIER SYSTEMS, 2011, 6713 : 56 - 65
[29] Distributional reinforcement learning with linear function approximation
Bellemare, Marc G.
Le Roux, Nicolas
Castro, Pablo Samuel
Moitra, Subhodeep
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[30] Reinforcement learning with function approximation converges to a region
Gordon, GJ
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 1040 - 1046

← 1 2 3 4 5 →