Multigrid methods for policy evaluation and reinforcement learning

被引：0

作者：

Ziv, O ^{[1
]}

Shimkin, N ^{[1
]}

机构：

[1] Technion Israel Inst Technol, Dept Elect Engn, IL-32000 Haifa, Israel

来源：

2005 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL & 13TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION, VOLS 1 AND 2 | 2005年

关键词：

FUNCTION APPROXIMATION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We introduce a new class of multigrid temporal-difference learning algorithms for speeding up the estimation of the value function related to a stationary policy, within the context of discounted cost Markov Decision Processes with linear functional approximation. The proposed scheme builds on the multigrid framework which is used in numerical analysis to enhance the iterative solution of linear equations. We first apply the multigrid approach to policy evaluation in the known model case. We then extend this approach to the learning case, and propose a scheme in which the basic TD(lambda) learning algorithm is applied at various resolution scales. The efficacy of the proposed algorithms is demonstrated through simulation experiments.

引用

页码：1391 / 1396

页数：6

共 50 条

[1] Multigrid Reinforcement Learning with Reward Shaping
Grzes, Marek
Kudenko, Daniel
ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 357 - 366
[2] Error bounds in reinforcement learning policy evaluation
Lu, FC
ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3501 : 438 - 449
[3] Least Square Policy Evaluation in Reinforcement Learning
Zhang, Haifei
Deng, Hailong
Huang, Liangbin
Hong, Ying
INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND INDUSTRIAL AUTOMATION (ICITIA 2015), 2015, : 583 - 590
[4] Independent Policy Gradient Methods for Competitive Reinforcement Learning
Daskalakis, Constantinos
Foster, Dylan J.
Golowich, Noah
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[5] Policy gradient methods for reinforcement learning with function approximation
Sutton, RS
McAllester, D
Singh, S
Mansour, Y
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1057 - 1063
[6] Federated Offline Reinforcement Learning with Proximal Policy Evaluation
Sheng YUE
Yongheng DENG
Guanbo WANG
Ju REN
Yaoxue ZHANG
Chinese Journal of Electronics, 2024, 33 (06) : 1360 - 1372
[7] Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning
Ramprasad, Pratik
Li, Yuantong
Yang, Zhuoran
Wang, Zhaoran
Sun, Will Wei
Cheng, Guang
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (544) : 2901 - 2914
[8] Federated Offline Reinforcement Learning with Proximal Policy Evaluation
Yue, Sheng
Deng, Yongheng
Wang, Guanbo
Ren, Ju
Zhang, Yaoxue
CHINESE JOURNAL OF ELECTRONICS, 2024, 33 (06) : 1360 - 1372
[9] A perspective on off-policy evaluation in reinforcement learning
Li, Lihong
FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (05) : 911 - 912
[10] A perspective on off-policy evaluation in reinforcement learning
Lihong Li
Frontiers of Computer Science, 2019, 13 : 911 - 912

← 1 2 3 4 5 →