Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning

被引:0
|
作者
Zhou, Zhengqing [1 ]
Zhou, Zhengyuan [2 ]
Bai, Qinxun [3 ]
Qiu, Linhai [4 ]
Blanchet, Jose [1 ]
Glynn, Peter [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] NYU Stern, New York, NY USA
[3] Horizon Robot, Beijing, Peoples R China
[4] Google, Mountain View, CA USA
基金
美国国家科学基金会;
关键词
GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While reinforcement learning has witnessed tremendous success recently in a wide range of domains, robustness-or the lack thereof-remains an important issue that has not been fully explored. In this paper, we provide a distributionally robust formulation offline learning policy in tabular RL that aims to learn a policy from historical data (collected by some other behavior policy) that is robust to the future environment that can deviate from the training environment. We first develop a novel policy evaluation scheme that accurately estimates the robust value (i.e. how robust it is in a perturbed environment) of any given policy and establish its finite-sample estimation error. Building on this, we then develop a novel and minimax-optimal distributionally robust learning algorithm that achieves O-P(1/root n) regret, meaning that with high probability, the policy learned from using n training data points will be O(1/root n) close to the optimal distributionally robust policy. Finally, our simulation results demonstrate the superiority of our distributionally robust approach compared to non-robust RL algorithms.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Distributionally Robust Model-based Reinforcement Learning with Large State Spaces
    Ramesh, Shyam Sundhar
    Sessa, Pier Giuseppe
    Hu, Yifan
    Krause, Andreas
    Bogunovic, Ilija
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [42] A Lower Bound for the Sample Complexity of Inverse Reinforcement Learning
    Komanduru, Abi
    Honorio, Jean
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [43] SETTLING THE SAMPLE COMPLEXITY OF MODEL-BASED OFFLINE REINFORCEMENT LEARNING
    Li, Gen
    Shi, Laixi
    Chen, Yuxin
    Chi, Yuejie
    Wei, Yuting
    ANNALS OF STATISTICS, 2024, 52 (01): : 233 - 260
  • [44] Sample strategy based on TD-error for offline reinforcement learning
    Zhang L.
    Feng Y.
    Liang X.
    Liu S.
    Cheng G.
    Huang J.
    Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2023, 45 (12): : 2118 - 2128
  • [46] Corruption-Robust Offline Reinforcement Learning with General Function Approximation
    Ye, Chenlu
    Yang, Rui
    Gu, Quanquan
    Zhang, Tong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [47] Finite-sample Guarantees for Nash Q-learning with Linear Function Approximation
    Cisneros-Velarde, Pedro
    Koyejo, Sanmi
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 424 - 432
  • [48] Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity
    Shi, Laixi
    Li, Gen
    Wei, Yuting
    Chen, Yuxin
    Chi, Yuejie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [49] Sample Complexity of Variance-Reduced Distributionally Robust Q-Learning
    Wang, Shengbo
    Si, Nian
    Blanchet, Jose
    Zhou, Zhengyuan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [50] Universal Lower Bound for Finite-Sample Reconstruction Error and Its Relation to Prolate Spheroidal Functions
    Gulcu, Talha Cihad
    Ozaktas, Haldun M.
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (01) : 50 - 54