POLICY EVALUATION WITH STOCHASTIC GRADIENT ESTIMATION TECHNIQUES

被引:0
|
作者
Zhou, Yi [1 ]
Fu, Michael C. [2 ]
Ryzhov, Ilya O.
机构
[1] Univ Maryland, Inst Syst Res, Dept Math, 8223 Paint Branch Dr, College Pk, MD 20742 USA
[2] Univ Maryland, Inst Syst Res, Robert H Smith Sch Business, 7699 Mowatt Ln, College Pk, MD 20742 USA
关键词
OPTIMIZATION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we consider policy evaluation in a finite-horizon setting with continuous state variables. The Bellman equation represents the value function as a conditional expectation, which can be further transformed into a ratio of two stochastic gradients. By using the finite difference method and the generalized likelihood ratio method, we propose new estimators for policy evaluation and show how the value of any given state can be estimated using sample paths starting from various other states.
引用
收藏
页码:3039 / 3050
页数:12
相关论文
共 50 条
  • [1] On Biased Stochastic Gradient Estimation
    Driggs, Derek
    Liang, Jingwei
    Schonlieb, Carola-Bibiane
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [2] Gradient Estimation with Stochastic Softmax Tricks
    Paulus, Max B.
    Choi, Dami
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [3] Fast and Accurate Stochastic Gradient Estimation
    Chen, Beidi
    Xu, Yingchen
    Shrivastava, Anshumali
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] Stochastic Proximal Gradient Descent with Acceleration Techniques
    Nitanda, Atsushi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [5] Stochastic Variance-Reduced Policy Gradient
    Papini, Matteo
    Binaghi, Damiano
    Canonaco, Giuseppe
    Pirotta, Matteo
    Restelli, Marcello
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [6] Landscape Analysis of Stochastic Policy Gradient Methods
    Liu, Xingtu
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT II, ECML PKDD 2024, 2024, 14942 : 3 - 17
  • [7] An improvement of policy gradient estimation algorithms
    Li, Yanjie
    Cao, Fang
    Cao, Xi-Ren
    WODES' 08: PROCEEDINGS OF THE 9TH INTERNATIONAL WORKSHOP ON DISCRETE EVENT SYSTEMS, 2008, : 168 - 172
  • [8] Analysis and improvement of policy gradient estimation
    Zhao, Tingting
    Hachiya, Hirotaka
    Niu, Gang
    Sugiyama, Masashi
    NEURAL NETWORKS, 2012, 26 : 118 - 129
  • [9] Gradient Estimation Using Stochastic Computation Graphs
    Schulman, John
    Heess, Nicolas
    Weber, Theophane
    Abbeel, Pieter
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [10] LIKELIHOOD RATIO GRADIENT ESTIMATION FOR STOCHASTIC RECURSIONS
    GLYNN, PW
    LECUYER, P
    ADVANCES IN APPLIED PROBABILITY, 1995, 27 (04) : 1019 - 1053