On the Global Optimum Convergence of Momentum-based Policy Gradient

被引:0
|
作者
Ding, Yuhao [1 ]
Zhang, Junzi [2 ]
Lavaei, Javad [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Amazon Advertising, San Francisco, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led to the development of a theoretical foundation for these methods. In this work, we generalize this line of research by establishing the first set of global convergence results of stochastic PG methods with momentum terms, which have been demonstrated to be efficient recipes for improving PG methods. We study both the soft-max and the Fishernon-degenerate policy parametrizations, and show that adding a momentum term improves the global optimality sample complexities of vanilla PG methods by (O) over tilde(epsilon(-1.5)) and (O) over tilde(epsilon(-1)), respectively, where epsilon > 0 is the target tolerance. Our results for the generic Fishernon-degenerate policy parametrizations also provide the first single-loop and finite-batch PG algorithm achieving an (O) over tilde (epsilon(-3)) global optimality sample complexity. Finally, as a byproduct, our analyses provide general tools for deriving the global convergence rates of stochastic PG methods, which can be readily applied and extended to other PG estimators under the two parametrizations.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] GLOBAL CONVERGENCE OF POLICY GRADIENT METHODS TO (ALMOST) LOCALLY OPTIMAL POLICIES
    Zhang, Kaiqing
    Koppel, Alec
    Zhu, Hao
    Basar, Tamer
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2020, 58 (06) : 3586 - 3612
  • [32] Momentum-Based Variance-Reduced Proximal Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization
    Xu, Yangyang
    Xu, Yibo
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2023, 196 (01) : 266 - 297
  • [33] Momentum-Based Variance-Reduced Proximal Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization
    Yangyang Xu
    Yibo Xu
    Journal of Optimization Theory and Applications, 2023, 196 : 266 - 297
  • [34] Force prediction by PIV imaging: A momentum-based approach
    Unal, MF
    Lin, JC
    Rockwell, D
    JOURNAL OF FLUIDS AND STRUCTURES, 1997, 11 (08) : 965 - 971
  • [35] Investor heterogeneity and momentum-based trading strategies in China
    Gao, Ya
    Han, Xing
    Li, Youwei
    Xiong, Xiong
    INTERNATIONAL REVIEW OF FINANCIAL ANALYSIS, 2021, 74
  • [36] A Momentum-Based Collision Detection Algorithm for Industrial Robots
    He, Sumei
    Ye, Jinhua
    Li, Zhijing
    Li, Shiyi
    Wu, Guokui
    Wu, Haibin
    2015 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2015, : 1253 - 1259
  • [37] Momentum-based approximation of incompressible multiphase fluid flows
    Cappanera, Loic
    Guermond, Jean-Luc
    Herreman, Wietze
    Nore, Caroline
    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 2018, 86 (08) : 541 - 563
  • [38] Gradient-based adaptive algorithm with an optimum vector of convergence factors
    Ghavami, M
    Rupprecht, W
    NaderEsfahani, S
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1995, 43 (12) : 3039 - 3042
  • [39] Virtual orbital angular momentum-based phase clock
    Zhang, Lei
    Yu, Benli
    OPTICA, 2024, 11 (10): : 1468 - 1477
  • [40] The free retraction of natural rubber: A momentum-based model
    Tunnicliffe, Lewis B.
    Thomas, Alan G.
    Busfield, James J. C.
    POLYMER TESTING, 2015, 47 : 36 - 41