POLICY EVALUATION WITH STOCHASTIC GRADIENT ESTIMATION TECHNIQUES

被引:0
|
作者
Zhou, Yi [1 ]
Fu, Michael C. [2 ]
Ryzhov, Ilya O.
机构
[1] Univ Maryland, Inst Syst Res, Dept Math, 8223 Paint Branch Dr, College Pk, MD 20742 USA
[2] Univ Maryland, Inst Syst Res, Robert H Smith Sch Business, 7699 Mowatt Ln, College Pk, MD 20742 USA
关键词
OPTIMIZATION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we consider policy evaluation in a finite-horizon setting with continuous state variables. The Bellman equation represents the value function as a conditional expectation, which can be further transformed into a ratio of two stochastic gradients. By using the finite difference method and the generalized likelihood ratio method, we propose new estimators for policy evaluation and show how the value of any given state can be estimated using sample paths starting from various other states.
引用
收藏
页码:3039 / 3050
页数:12
相关论文
共 50 条
  • [21] Online Covariance Matrix Estimation in Stochastic Gradient Descent
    Zhu, Wanrong
    Chen, Xi
    Wu, Wei Biao
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (541) : 393 - 404
  • [22] Stochastic Natural Gradient Descent by Estimation of Empirical Covariances
    Luigi, Malago
    Matteo, Matteucci
    Giovanni, Pistone
    2011 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2011, : 949 - 956
  • [23] Stochastic gradient estimation strategies for Markov random fields
    Younes, L
    BAYESIAN INFERENCE FOR INVERSE PROBLEMS, 1998, 3459 : 315 - 325
  • [24] Gradient-based stochastic estimation of the density matrix
    Wang, Zhentao
    Chern, Gia-Wei
    Batista, Cristian D.
    Barros, Kipton
    JOURNAL OF CHEMICAL PHYSICS, 2018, 148 (09):
  • [25] LIKELIHOOD RATIO GRADIENT ESTIMATION FOR STOCHASTIC-SYSTEMS
    GLYNN, PW
    COMMUNICATIONS OF THE ACM, 1990, 33 (10) : 75 - 84
  • [26] Stochastic gradient estimation using a single design point
    Wieland, Jamie R.
    Schmeiser, Bruce W.
    PROCEEDINGS OF THE 2006 WINTER SIMULATION CONFERENCE, VOLS 1-5, 2006, : 390 - 397
  • [27] Stochastic Gradient Estimation Algorithm for a class of Dual-Rate Stochastic Systems
    Cui Guimei
    Guan Yinghui
    Zhang Yong
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 159 - 163
  • [28] Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning Shixiang
    Gu, Shixiang
    Lillicrap, Timothy
    Ghahramani, Zoubin
    Turner, Richard E.
    Scholkopf, Bernhard
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [29] Scalability of Stochastic Gradient Descent based on "Smart" Sampling Techniques
    Clemencon, Stephan
    Bellet, Aurelien
    Jelassi, Ons
    Papa, Guillaume
    INNS CONFERENCE ON BIG DATA 2015 PROGRAM, 2015, 53 : 308 - 315
  • [30] Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method
    Gros, Sebastien
    Zanon, Mario
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1947 - 1952