POLICY EVALUATION WITH STOCHASTIC GRADIENT ESTIMATION TECHNIQUES

被引:0
|
作者
Zhou, Yi [1 ]
Fu, Michael C. [2 ]
Ryzhov, Ilya O.
机构
[1] Univ Maryland, Inst Syst Res, Dept Math, 8223 Paint Branch Dr, College Pk, MD 20742 USA
[2] Univ Maryland, Inst Syst Res, Robert H Smith Sch Business, 7699 Mowatt Ln, College Pk, MD 20742 USA
关键词
OPTIMIZATION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we consider policy evaluation in a finite-horizon setting with continuous state variables. The Bellman equation represents the value function as a conditional expectation, which can be further transformed into a ratio of two stochastic gradients. By using the finite difference method and the generalized likelihood ratio method, we propose new estimators for policy evaluation and show how the value of any given state can be estimated using sample paths starting from various other states.
引用
收藏
页码:3039 / 3050
页数:12
相关论文
共 50 条
  • [31] Stochastic Policy Gradient Ascent in Reproducing Kernel Hilbert Spaces
    Paternain, Santiago
    Bazerque, Juan Andres
    Small, Austin
    Ribeiro, Alejandro
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (08) : 3429 - 3444
  • [32] Reconsidering Stochastic Policy Gradient Methods for Traffic Signal Control
    Kato, Masahiro
    Kojima, Ryosuke
    ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, IEA-AIE 2024, 2024, 14748 : 442 - 453
  • [33] A Stochastic Policy Gradient Based Adaptive Control for Biped Walking
    Song, Sumian
    Yan, Gangfeng
    Tang, Chong
    Wang, Zidong
    Lin, Zhiyun
    2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 3224 - 3229
  • [34] COMPARISON OF GRADIENT ESTIMATION TECHNIQUES FOR QUEUES WITH NONIDENTICAL SERVERS
    FU, MC
    HU, JQ
    NAGI, R
    COMPUTERS & OPERATIONS RESEARCH, 1995, 22 (07) : 715 - 729
  • [35] A Temporal-Difference Approach to Policy Gradient Estimation
    Tosatto, Samuele
    Patterson, Andrew
    White, Martha
    Mahmood, A. Rupam
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [36] Adaptive Gradient Estimation Stochastic Parallel Gradient Descent Algorithm for Laser Beam Cleanup
    Ma, Shiqing
    Yang, Ping
    Lai, Boheng
    Su, Chunxuan
    Zhao, Wang
    Yang, Kangjian
    Jin, Ruiyan
    Cheng, Tao
    Xu, Bing
    PHOTONICS, 2021, 8 (05)
  • [37] Infinite-horizon policy-gradient estimation
    Baxter, J
    Bartlett, PL
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 : 319 - 350
  • [38] Robust Gradient Estimation Algorithm for a Stochastic System with Colored Noise
    Liu, Wentao
    Xiong, Weili
    INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2023, 21 (02) : 553 - 562
  • [39] Robust Gradient Estimation Algorithm for a Stochastic System with Colored Noise
    Wentao Liu
    Weili Xiong
    International Journal of Control, Automation and Systems, 2023, 21 : 553 - 562
  • [40] A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs
    Mao, Jingkai
    Foerster, Jakob
    Rocktaschel, Tim
    Al-Shedivat, Maruan
    Farquhar, Gregory
    Whiteson, Shimon
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97