Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning

被引：0

作者：

Zhou, Zhengqing ^{[1
]}

Zhou, Zhengyuan ^{[2
]}

Bai, Qinxun ^{[3
]}

Qiu, Linhai ^{[4
]}

Blanchet, Jose ^{[1
]}

Glynn, Peter ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

[2] NYU Stern, New York, NY USA

[3] Horizon Robot, Beijing, Peoples R China

[4] Google, Mountain View, CA USA

来源：

24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS) | 2021年 / 130卷

基金：

美国国家科学基金会;

关键词：

GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While reinforcement learning has witnessed tremendous success recently in a wide range of domains, robustness-or the lack thereof-remains an important issue that has not been fully explored. In this paper, we provide a distributionally robust formulation offline learning policy in tabular RL that aims to learn a policy from historical data (collected by some other behavior policy) that is robust to the future environment that can deviate from the training environment. We first develop a novel policy evaluation scheme that accurately estimates the robust value (i.e. how robust it is in a perturbed environment) of any given policy and establish its finite-sample estimation error. Building on this, we then develop a novel and minimax-optimal distributionally robust learning algorithm that achieves O-P(1/root n) regret, meaning that with high probability, the policy learned from using n training data points will be O(1/root n) close to the optimal distributionally robust policy. Finally, our simulation results demonstrate the superiority of our distributionally robust approach compared to non-robust RL algorithms.

引用

页数：11

共 50 条

[1] A Finite Sample Complexity Bound for Distributionally Robust Q-learning
Wang, Shengbo
Si, Nian
Blanchet, Jose
Zhou, Zhengyuan
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[2] Finite-Sample Guarantees for Wasserstein Distributionally Robust Optimization: Breaking the Curse of Dimensionality
Gao, Rui
OPERATIONS RESEARCH, 2023, 71 (06) : 2291 - 2306
[3] Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity
Shi, Laixi
Chi, Yuejie
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[4] Towards finite-sample convergence of direct reinforcement learning
Lim, SH
DeJong, G
MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 230 - 241
[5] A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms
Chen Z.
Performance Evaluation Review, 2023, 50 (03): : 12 - 15
[6] Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning
Chen, Zaiwei
Zhang, Sheng
Doan, Thinh T.
Clarke, John-Paul
Maguluri, Siva Theja
AUTOMATICA, 2022, 146
[7] Fast Rates for the Regret of Offline Reinforcement Learning
Hu, Yichun
Kallus, Nathan
Uehara, Masatoshi
MATHEMATICS OF OPERATIONS RESEARCH, 2025, 50 (01)
[8] Maximum Mean Discrepancy Distributionally Robust Nonlinear Chance-Constrained Optimization with Finite-Sample Guarantee
Nemmour, Yassine
Kremer, Heiner
Schoelkopf, Bernhard
Zhu, Jia-Jie
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 5660 - 5667
[9] Finite-Sample Analysis for Decentralized Batch Multiagent Reinforcement Learning With Networked Agents
Zhang, Kaiqing
Yang, Zhuoran
Liu, Han
Zhang, Tong
Basar, Tamer
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (12) : 5925 - 5940
[10] Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
Blanchet, Jose
Lu, Miao
Zhang, Tong
Zhong, Han
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →