Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning

被引:0
|
作者
Zhou, Zhengqing [1 ]
Zhou, Zhengyuan [2 ]
Bai, Qinxun [3 ]
Qiu, Linhai [4 ]
Blanchet, Jose [1 ]
Glynn, Peter [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] NYU Stern, New York, NY USA
[3] Horizon Robot, Beijing, Peoples R China
[4] Google, Mountain View, CA USA
基金
美国国家科学基金会;
关键词
GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While reinforcement learning has witnessed tremendous success recently in a wide range of domains, robustness-or the lack thereof-remains an important issue that has not been fully explored. In this paper, we provide a distributionally robust formulation offline learning policy in tabular RL that aims to learn a policy from historical data (collected by some other behavior policy) that is robust to the future environment that can deviate from the training environment. We first develop a novel policy evaluation scheme that accurately estimates the robust value (i.e. how robust it is in a perturbed environment) of any given policy and establish its finite-sample estimation error. Building on this, we then develop a novel and minimax-optimal distributionally robust learning algorithm that achieves O-P(1/root n) regret, meaning that with high probability, the policy learned from using n training data points will be O(1/root n) close to the optimal distributionally robust policy. Finally, our simulation results demonstrate the superiority of our distributionally robust approach compared to non-robust RL algorithms.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] A Finite Sample Complexity Bound for Distributionally Robust Q-learning
    Wang, Shengbo
    Si, Nian
    Blanchet, Jose
    Zhou, Zhengyuan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [2] Finite-Sample Guarantees for Wasserstein Distributionally Robust Optimization: Breaking the Curse of Dimensionality
    Gao, Rui
    OPERATIONS RESEARCH, 2023, 71 (06) : 2291 - 2306
  • [3] Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity
    Shi, Laixi
    Chi, Yuejie
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [4] Towards finite-sample convergence of direct reinforcement learning
    Lim, SH
    DeJong, G
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 230 - 241
  • [5] A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms
    Chen Z.
    Performance Evaluation Review, 2023, 50 (03): : 12 - 15
  • [6] Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning
    Chen, Zaiwei
    Zhang, Sheng
    Doan, Thinh T.
    Clarke, John-Paul
    Maguluri, Siva Theja
    AUTOMATICA, 2022, 146
  • [7] Fast Rates for the Regret of Offline Reinforcement Learning
    Hu, Yichun
    Kallus, Nathan
    Uehara, Masatoshi
    MATHEMATICS OF OPERATIONS RESEARCH, 2025, 50 (01)
  • [8] Maximum Mean Discrepancy Distributionally Robust Nonlinear Chance-Constrained Optimization with Finite-Sample Guarantee
    Nemmour, Yassine
    Kremer, Heiner
    Schoelkopf, Bernhard
    Zhu, Jia-Jie
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 5660 - 5667
  • [9] Finite-Sample Analysis for Decentralized Batch Multiagent Reinforcement Learning With Networked Agents
    Zhang, Kaiqing
    Yang, Zhuoran
    Liu, Han
    Zhang, Tong
    Basar, Tamer
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (12) : 5925 - 5940
  • [10] Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
    Blanchet, Jose
    Lu, Miao
    Zhang, Tong
    Zhong, Han
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,