Safe Reinforcement Learning with Linear Function Approximation

被引:0
|
作者
Amani, Sanae [1 ]
Thrampoulidis, Christos [2 ]
Yang, Lin F. [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90024 USA
[2] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Safety in reinforcement learning has become increasingly important in recent years. Yet, existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to catastrophic results in safety-critical systems, or fail to provide regret guarantees for settings where safety constraints need to be learned. In this paper, we address both problems by first modeling safety as an unknown linear cost function of states and actions, which must always fall below a certain threshold. We then present algorithms, termed SLUCB-QVI and RSLUCB-QVI, for finite-horizon Markov decision processes (MDPs) with linear function approximation. We show that SLUCB-QVI and RSLUCB-QVI, while with no safety violation, achieve a (O) over tilde (kappa root d(3)H(3)T) regret, nearly matching that of state-of-the-art unsafe algorithms, where k is the duration of each episode, d is the dimension of the feature mapping, kappa is a constant characterizing the safety constraints, and T is the total number of action played. We further present numerical simulations that corroborate our theoretical findings.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Distributional reinforcement learning with linear function approximation
    Bellemare, Marc G.
    Le Roux, Nicolas
    Castro, Pablo Samuel
    Moitra, Subhodeep
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [2] Parallel reinforcement learning with linear function approximation
    Grounds, Matthew
    Kudenko, Daniel
    ADAPTIVE AGENTS AND MULTI-AGENT SYSTEMS, 2008, 4865 : 60 - 74
  • [3] Exponential Hardness of Reinforcement Learning with Linear Function Approximation
    Kane, Daniel
    Liu, Sihan
    Lovett, Shachar
    Mahajan, Gaurav
    Szepesvari, Csaba
    Weisz, Gellert
    THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
  • [4] Logarithmic Regret for Reinforcement Learning with Linear Function Approximation
    He, Jiafan
    Zhou, Dongruo
    Gu, Quanquan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Provably Efficient Reinforcement Learning with Linear Function Approximation
    Jin, Chi
    Yang, Zhuoran
    Wang, Zhaoran
    Jordan, Michael, I
    MATHEMATICS OF OPERATIONS RESEARCH, 2023, 48 (03) : 1496 - 1521
  • [6] Exponential Hardness of Reinforcement Learning with Linear Function Approximation
    Kane, Daniel
    Liu, Sihan
    Lovett, Shachar
    Mahajan, Gaurav
    Szepesvári, Csaba
    Weisz, Gellért
    Proceedings of Machine Learning Research, 2023, 195 : 1588 - 1617
  • [7] Differentially Private Reinforcement Learning with Linear Function Approximation
    Zhou, Xingyu
    PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2022, 6 (01)
  • [8] On Reward-Free Reinforcement Learning with Linear Function Approximation
    Wang, Ruosong
    Du, Simon S.
    Yang, Lin F.
    Salakhutdinov, Ruslan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [9] Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation
    Hu, Pihe
    Chen, Yu
    Huang, Longbo
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation
    Winnicki, Anna
    Srikant, R.
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 801 - 806