Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning

被引：0

作者：

Zhou, Zhengqing ^{[1
]}

Zhou, Zhengyuan ^{[2
]}

Bai, Qinxun ^{[3
]}

Qiu, Linhai ^{[4
]}

Blanchet, Jose ^{[1
]}

Glynn, Peter ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

[2] NYU Stern, New York, NY USA

[3] Horizon Robot, Beijing, Peoples R China

[4] Google, Mountain View, CA USA

来源：

24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS) | 2021年 / 130卷

基金：

美国国家科学基金会;

关键词：

GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While reinforcement learning has witnessed tremendous success recently in a wide range of domains, robustness-or the lack thereof-remains an important issue that has not been fully explored. In this paper, we provide a distributionally robust formulation offline learning policy in tabular RL that aims to learn a policy from historical data (collected by some other behavior policy) that is robust to the future environment that can deviate from the training environment. We first develop a novel policy evaluation scheme that accurately estimates the robust value (i.e. how robust it is in a perturbed environment) of any given policy and establish its finite-sample estimation error. Building on this, we then develop a novel and minimax-optimal distributionally robust learning algorithm that achieves O-P(1/root n) regret, meaning that with high probability, the policy learned from using n training data points will be O(1/root n) close to the optimal distributionally robust policy. Finally, our simulation results demonstrate the superiority of our distributionally robust approach compared to non-robust RL algorithms.

引用

页数：11

共 50 条

[41] Distributionally Robust Model-based Reinforcement Learning with Large State Spaces
Ramesh, Shyam Sundhar
Sessa, Pier Giuseppe
Hu, Yifan
Krause, Andreas
Bogunovic, Ilija
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[42] A Lower Bound for the Sample Complexity of Inverse Reinforcement Learning
Komanduru, Abi
Honorio, Jean
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[43] SETTLING THE SAMPLE COMPLEXITY OF MODEL-BASED OFFLINE REINFORCEMENT LEARNING
Li, Gen
Shi, Laixi
Chen, Yuxin
Chi, Yuejie
Wei, Yuting
ANNALS OF STATISTICS, 2024, 52 (01): : 233 - 260
[44] Sample strategy based on TD-error for offline reinforcement learning
Zhang L.
Feng Y.
Liang X.
Liu S.
Cheng G.
Huang J.
Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2023, 45 (12): : 2118 - 2128
[45] ROBUST ESTIMATORS OF SCALE - FINITE-SAMPLE PERFORMANCE IN LONG-TAILED SYMMETRIC DISTRIBUTIONS
LAX, DA
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1985, 80 (391) : 736 - 741
[46] Corruption-Robust Offline Reinforcement Learning with General Function Approximation
Ye, Chenlu
Yang, Rui
Gu, Quanquan
Zhang, Tong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[47] Finite-sample Guarantees for Nash Q-learning with Linear Function Approximation
Cisneros-Velarde, Pedro
Koyejo, Sanmi
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 424 - 432
[48] Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity
Shi, Laixi
Li, Gen
Wei, Yuting
Chen, Yuxin
Chi, Yuejie
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[49] Sample Complexity of Variance-Reduced Distributionally Robust Q-Learning
Wang, Shengbo
Si, Nian
Blanchet, Jose
Zhou, Zhengyuan
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[50] Universal Lower Bound for Finite-Sample Reconstruction Error and Its Relation to Prolate Spheroidal Functions
Gulcu, Talha Cihad
Ozaktas, Haldun M.
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (01) : 50 - 54

← 1 2 3 4 5 →