Multigrid methods for policy evaluation and reinforcement learning

被引：0

作者：

Ziv, O ^{[1
]}

Shimkin, N ^{[1
]}

机构：

[1] Technion Israel Inst Technol, Dept Elect Engn, IL-32000 Haifa, Israel

来源：

2005 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL & 13TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION, VOLS 1 AND 2 | 2005年

关键词：

FUNCTION APPROXIMATION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We introduce a new class of multigrid temporal-difference learning algorithms for speeding up the estimation of the value function related to a stationary policy, within the context of discounted cost Markov Decision Processes with linear functional approximation. The proposed scheme builds on the multigrid framework which is used in numerical analysis to enhance the iterative solution of linear equations. We first apply the multigrid approach to policy evaluation in the known model case. We then extend this approach to the learning case, and propose a scheme in which the basic TD(lambda) learning algorithm is applied at various resolution scales. The efficacy of the proposed algorithms is demonstrated through simulation experiments.

引用

页码：1391 / 1396

页数：6

共 50 条

[21] Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation
Winnicki, Anna
Srikant, R.
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 801 - 806
[22] Autonomous helicopter control using reinforcement learning policy search methods
Bagnell, JA
Schneider, JG
2001 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2001, : 1615 - 1620
[23] Comparison of Different Domain Randomization Methods for Policy Transfer in Reinforcement Learning
Ma, Mingjun
Li, Haoran
Hu, Guangzheng
Liu, Shasha
Zhao, Dongbin
2023 IEEE 12TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE, DDCLS, 2023, : 1818 - 1823
[24] Policy derivation methods for critic -only reinforcement learning in continuous spaces
Alibekov, Eduard
Kubalik, Jiri
Babuska, Robert
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 69 : 178 - 187
[25] Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
Zhong, Rujie
Zhang, Duohan
Schafer, Lukas
Albrecht, Stefano V.
Hanna, Josiah P.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[26] Distributed Policy Evaluation with Fractional Order Dynamics in Multiagent Reinforcement Learning
Dai, Wei
Wang, Wei
Mao, Zhongtian
Jiang, Ruwen
Nian, Fudong
Li, Teng
SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
[27] Off-policy evaluation for tabular reinforcement learning with synthetic trajectories
Weiwei Wang
Yuqiang Li
Xianyi Wu
Statistics and Computing, 2024, 34
[28] Policy Learning with Human Reinforcement
Kao-Shing Hwang
Jin-Ling Lin
Haobin Shi
Yu-Ying Chen
International Journal of Fuzzy Systems, 2016, 18 : 618 - 629
[29] Fully asynchronous policy evaluation in distributed reinforcement learning over networks
Sha, Xingyu
Zhang, Jiaqi
You, Keyou
Zhang, Kaiqing
Basar, Tamer
AUTOMATICA, 2022, 136
[30] Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Jiang, Nan
Li, Lihong
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48

← 1 2 3 4 5 →