POLICY EVALUATION WITH STOCHASTIC GRADIENT ESTIMATION TECHNIQUES

被引：0

作者：

Zhou, Yi ^{[1
]}

Fu, Michael C. ^{[2
]}

Ryzhov, Ilya O.

机构：

[1] Univ Maryland, Inst Syst Res, Dept Math, 8223 Paint Branch Dr, College Pk, MD 20742 USA

[2] Univ Maryland, Inst Syst Res, Robert H Smith Sch Business, 7699 Mowatt Ln, College Pk, MD 20742 USA

来源：

2022 WINTER SIMULATION CONFERENCE (WSC) | 2022年

关键词：

OPTIMIZATION;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, we consider policy evaluation in a finite-horizon setting with continuous state variables. The Bellman equation represents the value function as a conditional expectation, which can be further transformed into a ratio of two stochastic gradients. By using the finite difference method and the generalized likelihood ratio method, we propose new estimators for policy evaluation and show how the value of any given state can be estimated using sample paths starting from various other states.

引用

页码：3039 / 3050

页数：12

共 50 条

[41] A Stochastic Gradient Method with Biased Estimation for Faster Nonconvex Optimization
Bi, Jia
Gunn, Steve R.
PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2019, 11671 : 337 - 349
[42] Stochastic Gradient Matching Pursuit Algorithm Based on Sparse Estimation
Zhao, Liquan
Hu, Yunfeng
Liu, Yulong
ELECTRONICS, 2019, 8 (02):
[43] Hyperparameter estimation of a variational model using a stochastic gradient method
Zerubia, J
Blanc-Féraud, L
BAYESIAN INFERENCE FOR INVERSE PROBLEMS, 1998, 3459 : 349 - 356
[44] Online estimation of the asymptotic variance for averaged stochastic gradient algorithms
Godichon-Baggioni, Antoine
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2019, 203 : 1 - 19
[45] Stochastic mutual information gradient estimation for dimensionality reduction networks
Oezdenizci, Ozan
Erdogmus, Deniz
INFORMATION SCIENCES, 2021, 570 : 298 - 305
[46] Stochastic mutual information gradient estimation for dimensionality reduction networks
Özdenizci, Ozan
Erdoğmuş, Deniz
Information Sciences, 2021, 570 : 298 - 305
[47] SVRG for Policy Evaluation with Fewer Gradient Evaluations
Peng, Zilun
Touati, Ahmed
Vincent, Pascal
Precup, Doina
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2697 - 2703
[48] Stochastic gradient descent analysis for the evaluation of a speaker recognition
Nasef, Ashrf
Marjanovic-Jakovljevic, Marina
Njegus, Angelina
ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING, 2017, 90 (02) : 389 - 397
[49] Stochastic gradient descent analysis for the evaluation of a speaker recognition
Ashrf Nasef
Marina Marjanović-Jakovljević
Angelina Njeguš
Analog Integrated Circuits and Signal Processing, 2017, 90 : 389 - 397
[50] EVALUATION OF MACROECONOMIC POLICIES BY STOCHASTIC CONTROL TECHNIQUES
CHOW, GC
INTERNATIONAL ECONOMIC REVIEW, 1978, 19 (02) : 311 - 319

← 1 2 3 4 5 →