Multigrid methods for policy evaluation and reinforcement learning

被引:0
|
作者
Ziv, O [1 ]
Shimkin, N [1 ]
机构
[1] Technion Israel Inst Technol, Dept Elect Engn, IL-32000 Haifa, Israel
关键词
FUNCTION APPROXIMATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce a new class of multigrid temporal-difference learning algorithms for speeding up the estimation of the value function related to a stationary policy, within the context of discounted cost Markov Decision Processes with linear functional approximation. The proposed scheme builds on the multigrid framework which is used in numerical analysis to enhance the iterative solution of linear equations. We first apply the multigrid approach to policy evaluation in the known model case. We then extend this approach to the learning case, and propose a scheme in which the basic TD(lambda) learning algorithm is applied at various resolution scales. The efficacy of the proposed algorithms is demonstrated through simulation experiments.
引用
收藏
页码:1391 / 1396
页数:6
相关论文
共 50 条
  • [31] Mild evaluation policy via dataset constraint for offline reinforcement learning
    Li, Xue
    Ling, Xinghong
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 274
  • [32] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
    Yin, Ming
    Wang, Yu-Xiang
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [33] Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
    Kallus, Nathan
    Uehara, Masatoshi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [34] Off-policy evaluation for tabular reinforcement learning with synthetic trajectories
    Wang, Weiwei
    Li, Yuqiang
    Wu, Xianyi
    STATISTICS AND COMPUTING, 2024, 34 (01)
  • [35] Policy Evaluation and Seeking for Multiagent Reinforcement Learning via Best Response
    Yan, Rui
    Duan, Xiaoming
    Shi, Zongying
    Zhong, Yisheng
    Marden, Jason R.
    Bullo, Francesco
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (04) : 1898 - 1913
  • [36] Acceleration of Reinforcement Learning by Policy Evaluation Using Nonstationary Iterative Method
    Senda, Kei
    Hattori, Suguru
    Hishinuma, Toru
    Kohda, Takehisa
    IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (12) : 2696 - 2705
  • [37] Policy Learning with Human Reinforcement
    Hwang, Kao-Shing
    Lin, Jin-Ling
    Shi, Haobin
    Chen, Yu-Ying
    INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2016, 18 (04) : 618 - 629
  • [38] Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods
    Quillen, Deirdre
    Jang, Eric
    Nachum, Ofir
    Finn, Chelsea
    Ibarz, Julian
    Levine, Sergey
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 6284 - 6291
  • [39] Exploratory Policy Generation Methods in On-line Deep Reinforcement Learning: A Survey
    Li, Shilei
    Ye, Qing
    Yuan, Zhimin
    Chen, Yun
    He, Tao
    Fu, Yu
    Jiqiren/Robot, 2024, 46 (06): : 753 - 768
  • [40] Two novel on-policy reinforcement learning algorithms based on TD(λ)-methods
    Wiering, Marco A.
    van Hasselt, Hado
    2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 280 - +