POLICY EVALUATION WITH STOCHASTIC GRADIENT ESTIMATION TECHNIQUES

被引：0

作者：

Zhou, Yi ^{[1
]}

Fu, Michael C. ^{[2
]}

Ryzhov, Ilya O.

机构：

[1] Univ Maryland, Inst Syst Res, Dept Math, 8223 Paint Branch Dr, College Pk, MD 20742 USA

[2] Univ Maryland, Inst Syst Res, Robert H Smith Sch Business, 7699 Mowatt Ln, College Pk, MD 20742 USA

来源：

2022 WINTER SIMULATION CONFERENCE (WSC) | 2022年

关键词：

OPTIMIZATION;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, we consider policy evaluation in a finite-horizon setting with continuous state variables. The Bellman equation represents the value function as a conditional expectation, which can be further transformed into a ratio of two stochastic gradients. By using the finite difference method and the generalized likelihood ratio method, we propose new estimators for policy evaluation and show how the value of any given state can be estimated using sample paths starting from various other states.

引用

页码：3039 / 3050

页数：12

共 50 条

[21] Online Covariance Matrix Estimation in Stochastic Gradient Descent
Zhu, Wanrong
Chen, Xi
Wu, Wei Biao
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (541) : 393 - 404
[22] Stochastic Natural Gradient Descent by Estimation of Empirical Covariances
Luigi, Malago
Matteo, Matteucci
Giovanni, Pistone
2011 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2011, : 949 - 956
[23] Stochastic gradient estimation strategies for Markov random fields
Younes, L
BAYESIAN INFERENCE FOR INVERSE PROBLEMS, 1998, 3459 : 315 - 325
[24] Gradient-based stochastic estimation of the density matrix
Wang, Zhentao
Chern, Gia-Wei
Batista, Cristian D.
Barros, Kipton
JOURNAL OF CHEMICAL PHYSICS, 2018, 148 (09):
[25] LIKELIHOOD RATIO GRADIENT ESTIMATION FOR STOCHASTIC-SYSTEMS
GLYNN, PW
COMMUNICATIONS OF THE ACM, 1990, 33 (10) : 75 - 84
[26] Stochastic gradient estimation using a single design point
Wieland, Jamie R.
Schmeiser, Bruce W.
PROCEEDINGS OF THE 2006 WINTER SIMULATION CONFERENCE, VOLS 1-5, 2006, : 390 - 397
[27] Stochastic Gradient Estimation Algorithm for a class of Dual-Rate Stochastic Systems
Cui Guimei
Guan Yinghui
Zhang Yong
2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 159 - 163
[28] Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning Shixiang
Gu, Shixiang
Lillicrap, Timothy
Ghahramani, Zoubin
Turner, Richard E.
Scholkopf, Bernhard
Levine, Sergey
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[29] Scalability of Stochastic Gradient Descent based on "Smart" Sampling Techniques
Clemencon, Stephan
Bellet, Aurelien
Jelassi, Ons
Papa, Guillaume
INNS CONFERENCE ON BIG DATA 2015 PROGRAM, 2015, 53 : 308 - 315
[30] Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method
Gros, Sebastien
Zanon, Mario
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1947 - 1952

← 1 2 3 4 5 →