State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning With Rewards

被引:4
|
作者
Calvo-Fullana, Miguel [1 ]
Paternain, Santiago [2 ]
Chamon, Luiz F. O. [3 ]
Ribeiro, Alejandro [4 ]
机构
[1] Univ Pompeu Fabra, Dept Informat & Commun Technol, Barcelona 08002, Spain
[2] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12180 USA
[3] Univ Stuttgart, Excellence Cluster Simulat Technol, D-70174 Stuttgart, Germany
[4] Univ Penn, Dept Elect & Syst Engn, Philadelphia, PA 19104 USA
关键词
Reinforcement learning; Trajectory; Task analysis; Monitoring; Optimization; Convergence; Systematics; Autonomous systems; optimization; reinforcement learning; ACTOR-CRITIC ALGORITHM; APPROXIMATION;
D O I
10.1109/TAC.2023.3319070
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds. In this class of problems, we show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards. Hence, there exist constrained reinforcement learning problems for which neither regularized nor classical primal-dual methods yield optimal policies. This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods as the portion of the dynamics that drives the multiplier evolution. This approach provides a systematic state augmentation procedure that is guaranteed to solve reinforcement learning problems with constraints. Thus, as we illustrate by an example, while previous methods can fail at finding optimal policies, running the dual dynamics while executing the augmented policy yields an algorithm that provably samples actions from the optimal policy.
引用
收藏
页码:4275 / 4290
页数:16
相关论文
共 50 条
  • [21] Inverse Constrained Reinforcement Learning
    Malik, Shehryar
    Anwar, Usman
    Aghasi, Alireza
    Ahmed, Ali
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [22] Overcoming Exploration in Reinforcement Learning with Demonstrations
    Nair, Ashvin
    McGrew, Bob
    Andrychowicz, Marcin
    Zaremba, Wojciech
    Abbeel, Pieter
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 6292 - 6299
  • [23] Reinforcement Learning for Joint Optimization of Multiple Rewards
    Agarwal, Mridul
    Aggarwal, Vaneet
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [24] Detecting Rewards Deterioration in Episodic Reinforcement Learning
    Greenberg, Ido
    Mannor, Shie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [25] Reinforcement learning with pattern-based rewards
    Peters, JF
    Henry, C
    Ramanna, S
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2005, : 267 - 272
  • [26] RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation
    Parnell, Jacob
    Unanue, Inigo Jauregi
    Piccardi, Massimo
    SPNLP 2021: THE 5TH WORKSHOP ON STRUCTURED PREDICTION FOR NLP, 2021, : 1 - 11
  • [27] Video Prediction Models as Rewards for Reinforcement Learning
    Escontrela, Alejandro
    Adeniji, Ademi
    Yan, Wilson
    Jain, Ajay
    Bin Peng, Xue
    Goldberg, Ken
    Lee, Youngwoon
    Hafner, Danijar
    Abbeel, Pieter
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] Reinforcement Learning with Non-Markovian Rewards
    Gaon, Maor
    Brafman, Ronen, I
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3980 - 3987
  • [29] Reinforcement Learning with Immediate Rewards and Linear Hypotheses
    Naoki Abe
    Alan W. Biermann
    Philip M. Long
    Algorithmica , 2003, 37 : 263 - 293
  • [30] Reinforcement learning with immediate rewards and linear hypotheses
    Abe, N
    Biermann, AW
    Long, PM
    ALGORITHMICA, 2003, 37 (04) : 263 - 293