POLICY EVALUATION WITH STOCHASTIC GRADIENT ESTIMATION TECHNIQUES

被引：0

作者：

Zhou, Yi ^{[1
]}

Fu, Michael C. ^{[2
]}

Ryzhov, Ilya O.

机构：

[1] Univ Maryland, Inst Syst Res, Dept Math, 8223 Paint Branch Dr, College Pk, MD 20742 USA

[2] Univ Maryland, Inst Syst Res, Robert H Smith Sch Business, 7699 Mowatt Ln, College Pk, MD 20742 USA

来源：

2022 WINTER SIMULATION CONFERENCE (WSC) | 2022年

关键词：

OPTIMIZATION;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, we consider policy evaluation in a finite-horizon setting with continuous state variables. The Bellman equation represents the value function as a conditional expectation, which can be further transformed into a ratio of two stochastic gradients. By using the finite difference method and the generalized likelihood ratio method, we propose new estimators for policy evaluation and show how the value of any given state can be estimated using sample paths starting from various other states.

引用

页码：3039 / 3050

页数：12

共 50 条

[31] Stochastic Policy Gradient Ascent in Reproducing Kernel Hilbert Spaces
Paternain, Santiago
Bazerque, Juan Andres
Small, Austin
Ribeiro, Alejandro
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (08) : 3429 - 3444
[32] Reconsidering Stochastic Policy Gradient Methods for Traffic Signal Control
Kato, Masahiro
Kojima, Ryosuke
ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, IEA-AIE 2024, 2024, 14748 : 442 - 453
[33] A Stochastic Policy Gradient Based Adaptive Control for Biped Walking
Song, Sumian
Yan, Gangfeng
Tang, Chong
Wang, Zidong
Lin, Zhiyun
2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 3224 - 3229
[34] COMPARISON OF GRADIENT ESTIMATION TECHNIQUES FOR QUEUES WITH NONIDENTICAL SERVERS
FU, MC
HU, JQ
NAGI, R
COMPUTERS & OPERATIONS RESEARCH, 1995, 22 (07) : 715 - 729
[35] A Temporal-Difference Approach to Policy Gradient Estimation
Tosatto, Samuele
Patterson, Andrew
White, Martha
Mahmood, A. Rupam
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[36] Adaptive Gradient Estimation Stochastic Parallel Gradient Descent Algorithm for Laser Beam Cleanup
Ma, Shiqing
Yang, Ping
Lai, Boheng
Su, Chunxuan
Zhao, Wang
Yang, Kangjian
Jin, Ruiyan
Cheng, Tao
Xu, Bing
PHOTONICS, 2021, 8 (05)
[37] Infinite-horizon policy-gradient estimation
Baxter, J
Bartlett, PL
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 : 319 - 350
[38] Robust Gradient Estimation Algorithm for a Stochastic System with Colored Noise
Liu, Wentao
Xiong, Weili
INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2023, 21 (02) : 553 - 562
[39] Robust Gradient Estimation Algorithm for a Stochastic System with Colored Noise
Wentao Liu
Weili Xiong
International Journal of Control, Automation and Systems, 2023, 21 : 553 - 562
[40] A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs
Mao, Jingkai
Foerster, Jakob
Rocktaschel, Tim
Al-Shedivat, Maruan
Farquhar, Gregory
Whiteson, Shimon
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97

← 1 2 3 4 5 →