On the Global Optimum Convergence of Momentum-based Policy Gradient

被引:0
|
作者
Ding, Yuhao [1 ]
Zhang, Junzi [2 ]
Lavaei, Javad [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Amazon Advertising, San Francisco, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led to the development of a theoretical foundation for these methods. In this work, we generalize this line of research by establishing the first set of global convergence results of stochastic PG methods with momentum terms, which have been demonstrated to be efficient recipes for improving PG methods. We study both the soft-max and the Fishernon-degenerate policy parametrizations, and show that adding a momentum term improves the global optimality sample complexities of vanilla PG methods by (O) over tilde(epsilon(-1.5)) and (O) over tilde(epsilon(-1)), respectively, where epsilon > 0 is the target tolerance. Our results for the generic Fishernon-degenerate policy parametrizations also provide the first single-loop and finite-batch PG algorithm achieving an (O) over tilde (epsilon(-3)) global optimality sample complexity. Finally, as a byproduct, our analyses provide general tools for deriving the global convergence rates of stochastic PG methods, which can be readily applied and extended to other PG estimators under the two parametrizations.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] GENERALIZED MOMENTUM-BASED METHODS: A HAMILTONIAN PERSPECTIVE
    Diakonikolas, Jelena
    Jordan, Michael, I
    SIAM JOURNAL ON OPTIMIZATION, 2021, 31 (01) : 915 - 944
  • [22] Momentum-Based Contextual Federated Reinforcement Learning
    Yue, Sheng
    Hua, Xingyuan
    Deng, Yongheng
    Chen, Lili
    Ren, Ju
    Zhang, Yaoxue
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2024,
  • [23] Momentum-Based Adaptive Laws for Identification and Control
    Somers, Luke
    Haddad, Wassim M.
    AEROSPACE, 2024, 11 (12)
  • [24] A momentum-based approach to learning Nash equilibria
    Zhang, Huaxiang
    Liu, Peide
    AGENT COMPUTING AND MULTI-AGENT SYSTEMS, 2006, 4088 : 528 - 533
  • [25] Momentum-Based Topology Estimation of Articulated Objects
    Tirupachuri, Yeshasvi
    Traversaro, Silvio
    Nori, Francesco
    Pucci, Daniele
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, 2020, 1038 : 1093 - 1105
  • [26] Momentum-based parameterization of dynamic character motion
    Abe, YH
    Liu, CK
    Popovic, Z
    GRAPHICAL MODELS, 2006, 68 (02) : 194 - 211
  • [27] Practical and Fast Momentum-Based Power Methods
    Rabbani, Tahseen
    Jain, Apollo
    Rajkumar, Arjun
    Huang, Furong
    MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 145, 2021, 145 : 721 - 756
  • [28] Momentum-based distributed gradient tracking algorithms for distributed aggregative optimization over unbalanced directed graphs
    Wang, Zhu
    Wang, Dong
    Lian, Jie
    Ge, Hongwei
    Wang, Wei
    AUTOMATICA, 2024, 164
  • [29] Momentum-based variance-reduced stochastic Bregman proximal gradient methods for nonconvex nonsmooth optimization
    Liao, Shichen
    Liu, Yan
    Han, Congying
    Guo, Tiande
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 266
  • [30] Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
    Cen, Shicong
    Cheng, Chen
    Chen, Yuxin
    Wei, Yuting
    Chi, Yuejie
    OPERATIONS RESEARCH, 2021, 70 (04) : 2563 - 2578