Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning

被引:0
|
作者
Zhou, Zhengqing [1 ]
Zhou, Zhengyuan [2 ]
Bai, Qinxun [3 ]
Qiu, Linhai [4 ]
Blanchet, Jose [1 ]
Glynn, Peter [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] NYU Stern, New York, NY USA
[3] Horizon Robot, Beijing, Peoples R China
[4] Google, Mountain View, CA USA
基金
美国国家科学基金会;
关键词
GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While reinforcement learning has witnessed tremendous success recently in a wide range of domains, robustness-or the lack thereof-remains an important issue that has not been fully explored. In this paper, we provide a distributionally robust formulation offline learning policy in tabular RL that aims to learn a policy from historical data (collected by some other behavior policy) that is robust to the future environment that can deviate from the training environment. We first develop a novel policy evaluation scheme that accurately estimates the robust value (i.e. how robust it is in a perturbed environment) of any given policy and establish its finite-sample estimation error. Building on this, we then develop a novel and minimax-optimal distributionally robust learning algorithm that achieves O-P(1/root n) regret, meaning that with high probability, the policy learned from using n training data points will be O(1/root n) close to the optimal distributionally robust policy. Finally, our simulation results demonstrate the superiority of our distributionally robust approach compared to non-robust RL algorithms.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Learning Topic Models: Identifiability and Finite-Sample Analysis
    Chen, Yinyin
    He, Shishuang
    Yang, Yun
    Liang, Feng
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (544) : 2860 - 2875
  • [22] Statistical Learning Theory for Control: A Finite-Sample Perspective
    Tsiamis A.
    Ziemann I.
    Matni N.
    Pappas G.J.
    IEEE Control Systems, 2023, 43 (06) : 67 - 97
  • [23] Finite-Sample Analysis For Decentralized Cooperative Multi-Agent Reinforcement Learning From Batch Data
    Zhang, Kaiqing
    Yang, Zhuoran
    Liu, Han
    Zhang, Tong
    Basar, Tamer
    IFAC PAPERSONLINE, 2020, 53 (02): : 1049 - 1056
  • [24] A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models
    Huang, Ruitong
    Szepesvari, Csaba
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 33, 2014, 33 : 402 - 410
  • [25] Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables
    Xu, Mengdi
    Huang, Peide
    Niu, Yaru
    Kumar, Visak
    Qiu, Jielin
    Fang, Chao
    Lee, Kuan-Hui
    Qi, Xuewei
    Lam, Henry
    Li, Bo
    Zhao, Ding
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [26] Sample Efficient Offline-to-Online Reinforcement Learning
    Guo, Siyuan
    Zou, Lixin
    Chen, Hechang
    Qu, Bohao
    Chi, Haotian
    Yu, Philip S.
    Chang, Yi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (03) : 1299 - 1310
  • [27] Finite-sample performance of the robust variance estimator in the presence of missing data
    Ishii, Ryota
    Maruo, Kazushi
    Doi, Masaaki
    Gosho, Masahiko
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024, 53 (06) : 2692 - 2703
  • [28] A Finite-Sample, Distribution-Free, Probabilistic Lower Bound on Mutual Information
    VanderKraats, Nathan D.
    Banerjee, Arunava
    NEURAL COMPUTATION, 2011, 23 (07) : 1862 - 1898
  • [29] RORL: Robust Offline Reinforcement Learning via Conservative Smoothing
    Yang, Rui
    Bai, Chenjia
    Ma, Xiaoteng
    Wang, Zhaoran
    Zhang, Chongjie
    Han, Lei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [30] Online Learning of Parameterized Uncertain Dynamical Environments With Finite-Sample Guarantees
    Li, Dan
    Fooladivanda, Dariush
    Martinez, Sonia
    IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (04): : 1351 - 1356