State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning With Rewards

被引:4
|
作者
Calvo-Fullana, Miguel [1 ]
Paternain, Santiago [2 ]
Chamon, Luiz F. O. [3 ]
Ribeiro, Alejandro [4 ]
机构
[1] Univ Pompeu Fabra, Dept Informat & Commun Technol, Barcelona 08002, Spain
[2] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12180 USA
[3] Univ Stuttgart, Excellence Cluster Simulat Technol, D-70174 Stuttgart, Germany
[4] Univ Penn, Dept Elect & Syst Engn, Philadelphia, PA 19104 USA
关键词
Reinforcement learning; Trajectory; Task analysis; Monitoring; Optimization; Convergence; Systematics; Autonomous systems; optimization; reinforcement learning; ACTOR-CRITIC ALGORITHM; APPROXIMATION;
D O I
10.1109/TAC.2023.3319070
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds. In this class of problems, we show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards. Hence, there exist constrained reinforcement learning problems for which neither regularized nor classical primal-dual methods yield optimal policies. This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods as the portion of the dynamics that drives the multiplier evolution. This approach provides a systematic state augmentation procedure that is guaranteed to solve reinforcement learning problems with constraints. Thus, as we illustrate by an example, while previous methods can fail at finding optimal policies, running the dual dynamics while executing the augmented policy yields an algorithm that provably samples actions from the optimal policy.
引用
收藏
页码:4275 / 4290
页数:16
相关论文
共 50 条
  • [41] Orientation-Preserving Rewards' Balancing in Reinforcement Learning
    Ren, Jinsheng
    Guo, Shangqi
    Chen, Feng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6458 - 6472
  • [42] Reinforcement Learning and Additional Rewards for the Traveling Salesman Problem
    Mele, Umberto Junior
    Chou, Xiaochen
    Gambardella, Luca Maria
    Montemanni, Roberto
    2021 THE 8TH INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND APPLICATIONS-EUROPE, ICIEA 2021-EUROPE, 2021, : 198 - 204
  • [43] Deep Reinforcement Learning For SPORADIC Rewards With HUMAN Experience
    Sinha, Harshit
    PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [44] Tentative Exploration on Reinforcement Learning Algorithms for Stochastic Rewards
    Pena, Luis
    LaTorre, Antonio
    Pena, Jose-Maria
    Ossowski, Sascha
    HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, 2009, 5572 : 336 - 343
  • [45] Off-Policy Reinforcement Learning with Delayed Rewards
    Han, Beining
    Ren, Zhizhou
    Wu, Zuofan
    Zhou, Yuan
    Peng, Jian
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [46] Automatic Successive Reinforcement Learning with Multiple Auxiliary Rewards
    Fu, Zhao-Yang
    Zhan, De-Chuan
    Li, Xin-Chun
    Lu, Yi-Xing
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2336 - 2342
  • [47] Doubly constrained offline reinforcement learning for learning path recommendation
    Yun, Yue
    Dai, Huan
    An, Rui
    Zhang, Yupei
    Shang, Xuequn
    KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [48] Doubly constrained offline reinforcement learning for learning path recommendation
    Yun, Yue
    Dai, Huan
    An, Rui
    Zhang, Yupei
    Shang, Xuequn
    Knowledge-Based Systems, 2024, 284
  • [49] Learning to soar: Resource-constrained exploration in reinforcement learning
    Chung, Jen Jen
    Lawrance, Nicholas R. J.
    Sukkarieh, Salah
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2015, 34 (02): : 158 - 172
  • [50] SOQ: Structural Reinforcement Learning for Constrained Delay Minimization With Channel State Information
    Zhao, Yu
    Kim, Yeongjin
    Lee, Joohyun
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (03) : 4628 - 4644