Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning

被引:0
|
作者
Zhou, Zhengqing [1 ]
Zhou, Zhengyuan [2 ]
Bai, Qinxun [3 ]
Qiu, Linhai [4 ]
Blanchet, Jose [1 ]
Glynn, Peter [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] NYU Stern, New York, NY USA
[3] Horizon Robot, Beijing, Peoples R China
[4] Google, Mountain View, CA USA
基金
美国国家科学基金会;
关键词
GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While reinforcement learning has witnessed tremendous success recently in a wide range of domains, robustness-or the lack thereof-remains an important issue that has not been fully explored. In this paper, we provide a distributionally robust formulation offline learning policy in tabular RL that aims to learn a policy from historical data (collected by some other behavior policy) that is robust to the future environment that can deviate from the training environment. We first develop a novel policy evaluation scheme that accurately estimates the robust value (i.e. how robust it is in a perturbed environment) of any given policy and establish its finite-sample estimation error. Building on this, we then develop a novel and minimax-optimal distributionally robust learning algorithm that achieves O-P(1/root n) regret, meaning that with high probability, the policy learned from using n training data points will be O(1/root n) close to the optimal distributionally robust policy. Finally, our simulation results demonstrate the superiority of our distributionally robust approach compared to non-robust RL algorithms.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Robust Offline Reinforcement Learning with Heavy-Tailed Rewards
    Zhu, Jin
    Wan, Runzhe
    Qi, Zhengling
    Luo, Shikai
    Shi, Chengchun
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [32] Byzantine-Robust Online and Offline Distributed Reinforcement Learning
    Chen, Yiding
    Zhang, Xuezhou
    Zhang, Kaiqing
    Wang, Mengdi
    Zhu, Xiaojin
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [33] Overcoming model bias for robust offline deep reinforcement learning
    Swazinna, Phillip
    Udluft, Steffen
    Runkler, Thomas
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 104
  • [34] Sample Complexity for Distributionally Robust Learning under χ2-divergence
    Zhou, Zhengyu
    Liu, Weiwei
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [35] A simple and general debiased machine learning theorem with finite-sample guarantees
    Chernozhukov, V
    Newey, W. K.
    Singh, R.
    BIOMETRIKA, 2023, 110 (01) : 257 - 264
  • [36] Finite-sample convergence rates for Q-learning and indirect algorithms
    Kearns, M
    Singh, S
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 11, 1999, 11 : 996 - 1002
  • [37] Convergence and bound computation for chance constrained distributionally robust models using sample approximation
    Lei, Jiaqi
    Mehrotra, Sanjay
    OPERATIONS RESEARCH LETTERS, 2025, 60
  • [38] Online Learning of Parameterized Uncertain Dynamical Environments with Finite-sample Guarantees
    Li, Dan
    Fooladivanda, Dariush
    Martinez, Sonia
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 2005 - 2010
  • [39] Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning
    Queeney, James
    Benosman, Mouhacine
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [40] Safe Reinforcement Learning Using Wasserstein Distributionally Robust MPC and Chance Constraint
    Kordabad, Arash Bahari
    Wisniewski, Rafael
    Gros, Sebastien
    IEEE ACCESS, 2022, 10 : 130058 - 130067