POLICY EVALUATION WITH STOCHASTIC GRADIENT ESTIMATION TECHNIQUES

被引:0
|
作者
Zhou, Yi [1 ]
Fu, Michael C. [2 ]
Ryzhov, Ilya O.
机构
[1] Univ Maryland, Inst Syst Res, Dept Math, 8223 Paint Branch Dr, College Pk, MD 20742 USA
[2] Univ Maryland, Inst Syst Res, Robert H Smith Sch Business, 7699 Mowatt Ln, College Pk, MD 20742 USA
关键词
OPTIMIZATION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we consider policy evaluation in a finite-horizon setting with continuous state variables. The Bellman equation represents the value function as a conditional expectation, which can be further transformed into a ratio of two stochastic gradients. By using the finite difference method and the generalized likelihood ratio method, we propose new estimators for policy evaluation and show how the value of any given state can be estimated using sample paths starting from various other states.
引用
收藏
页码:3039 / 3050
页数:12
相关论文
共 50 条
  • [41] A Stochastic Gradient Method with Biased Estimation for Faster Nonconvex Optimization
    Bi, Jia
    Gunn, Steve R.
    PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2019, 11671 : 337 - 349
  • [42] Stochastic Gradient Matching Pursuit Algorithm Based on Sparse Estimation
    Zhao, Liquan
    Hu, Yunfeng
    Liu, Yulong
    ELECTRONICS, 2019, 8 (02):
  • [43] Hyperparameter estimation of a variational model using a stochastic gradient method
    Zerubia, J
    Blanc-Féraud, L
    BAYESIAN INFERENCE FOR INVERSE PROBLEMS, 1998, 3459 : 349 - 356
  • [44] Online estimation of the asymptotic variance for averaged stochastic gradient algorithms
    Godichon-Baggioni, Antoine
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2019, 203 : 1 - 19
  • [45] Stochastic mutual information gradient estimation for dimensionality reduction networks
    Oezdenizci, Ozan
    Erdogmus, Deniz
    INFORMATION SCIENCES, 2021, 570 : 298 - 305
  • [46] Stochastic mutual information gradient estimation for dimensionality reduction networks
    Özdenizci, Ozan
    Erdoğmuş, Deniz
    Information Sciences, 2021, 570 : 298 - 305
  • [47] SVRG for Policy Evaluation with Fewer Gradient Evaluations
    Peng, Zilun
    Touati, Ahmed
    Vincent, Pascal
    Precup, Doina
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2697 - 2703
  • [48] Stochastic gradient descent analysis for the evaluation of a speaker recognition
    Nasef, Ashrf
    Marjanovic-Jakovljevic, Marina
    Njegus, Angelina
    ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING, 2017, 90 (02) : 389 - 397
  • [49] Stochastic gradient descent analysis for the evaluation of a speaker recognition
    Ashrf Nasef
    Marina Marjanović-Jakovljević
    Angelina Njeguš
    Analog Integrated Circuits and Signal Processing, 2017, 90 : 389 - 397