Multigrid methods for policy evaluation and reinforcement learning

被引：0

作者：

Ziv, O ^{[1
]}

Shimkin, N ^{[1
]}

机构：

[1] Technion Israel Inst Technol, Dept Elect Engn, IL-32000 Haifa, Israel

来源：

2005 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL & 13TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION, VOLS 1 AND 2 | 2005年

关键词：

FUNCTION APPROXIMATION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We introduce a new class of multigrid temporal-difference learning algorithms for speeding up the estimation of the value function related to a stationary policy, within the context of discounted cost Markov Decision Processes with linear functional approximation. The proposed scheme builds on the multigrid framework which is used in numerical analysis to enhance the iterative solution of linear equations. We first apply the multigrid approach to policy evaluation in the known model case. We then extend this approach to the learning case, and propose a scheme in which the basic TD(lambda) learning algorithm is applied at various resolution scales. The efficacy of the proposed algorithms is demonstrated through simulation experiments.

引用

页码：1391 / 1396

页数：6

共 50 条

[31] Mild evaluation policy via dataset constraint for offline reinforcement learning
Li, Xue
Ling, Xinghong
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 274
[32] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
Yin, Ming
Wang, Yu-Xiang
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[33] Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
Kallus, Nathan
Uehara, Masatoshi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[34] Off-policy evaluation for tabular reinforcement learning with synthetic trajectories
Wang, Weiwei
Li, Yuqiang
Wu, Xianyi
STATISTICS AND COMPUTING, 2024, 34 (01)
[35] Policy Evaluation and Seeking for Multiagent Reinforcement Learning via Best Response
Yan, Rui
Duan, Xiaoming
Shi, Zongying
Zhong, Yisheng
Marden, Jason R.
Bullo, Francesco
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (04) : 1898 - 1913
[36] Acceleration of Reinforcement Learning by Policy Evaluation Using Nonstationary Iterative Method
Senda, Kei
Hattori, Suguru
Hishinuma, Toru
Kohda, Takehisa
IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (12) : 2696 - 2705
[37] Policy Learning with Human Reinforcement
Hwang, Kao-Shing
Lin, Jin-Ling
Shi, Haobin
Chen, Yu-Ying
INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2016, 18 (04) : 618 - 629
[38] Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods
Quillen, Deirdre
Jang, Eric
Nachum, Ofir
Finn, Chelsea
Ibarz, Julian
Levine, Sergey
2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 6284 - 6291
[39] Exploratory Policy Generation Methods in On-line Deep Reinforcement Learning: A Survey
Li, Shilei
Ye, Qing
Yuan, Zhimin
Chen, Yun
He, Tao
Fu, Yu
Jiqiren/Robot, 2024, 46 (06): : 753 - 768
[40] Two novel on-policy reinforcement learning algorithms based on TD(λ)-methods
Wiering, Marco A.
van Hasselt, Hado
2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 280 - +

← 1 2 3 4 5 →